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THE DEVELOPMENT AND EVALUATION OF 
AN INSTRUMENT TO ASSESS THE ATTI- 


TUDES OF PUBLIC SCHOOL PRINCIPALS 


FRANK A. SCOTT* 
East Carolina College 
Greenville, North Carolina 


SECTION I 


Introduction 


THE COLLEGE of Education of the Univer- 
sity of Georgia, with the assistance of the W. K. 
Kellogg Foundation, has made a series of ex- 
haustive studies of the principalship in the State 
of Georgia. These studies, by the use of the 
critical incident technique, identified the c rit- 
ical requirements for the principalship by col- 
lecting instances of behaviors deemed to be crit- 
ical for the effective performance of the princi- 
pal in carrying out his duties. The author be- 
lieved that these initial studies should serve as 
bases for the development of an instrument 
which might be used to evaluate the probable 
performance of public school principals. 

An instrument of this type shouldbe valuable 
because public school superintendents have long 
been confronted with the problem of selecting ef- 
fective public school principals. Colleges of ed- 
ucation have helped to solve this problem by 
screening procedures, appropriate training and 
continuous evaluation of students studying to be- 
come principals. Screening procedures by both 
superintendents of schools and colleges of edu- 
cation should be given increased attention in or- 
der to select those applicants who have the char- 
acteristics deemed necessary for performing ef- 
fectively the work of the public school principal- 
ship. 


The Problem 


The purpose of this study is the development 
and evaluation of an instrument which may aid 





in screening applicants for principals’ positions. 
The development of this instrument involved the 
testing of the following hypothesis: the profes- 
sional attitudes of principals are significantly re- 
lated to the rated effectiveness of principals in 
performing the duties of the principalship. To 
test this hypothesis the writer decided to attempt 
to construct a measuring device which would dif- 
ferentiate between ‘‘most effective’’ and ‘‘least 
effective’’ principals by isolating certain charac- 
teristic attitudes of each group. 

The writer also decided to attempt to deter- 
mine the extent to which certain personal and 
professional characteristics of principals are re- 
lated to levels of rated effectiveness in over-all 
school administration. These characteristics 
were: sex of respondent, age of respondent, type 
of school of which respondent is principal, length 
of experience of respondent as a teacher, length 
of experience of respondent as a principal, non- 
school experience of respondent, and number of 
positions respondent has held within the past ten 
years. 

The objectives of this project were accom- 
plished by carrying out the following procedures: 


1. The development of statements which would 
reflect principals’ attitudes toward problems 
related to the principalship. 

. The selection of two distinct groups of princi- 
pals in terms of rated effectiveness in over- 
all school administration. 

. The administration of these attitude state- 
ments to the selected principals so as to de- 
termine the differences in responses made by 
the two groups rated to be ‘‘most effec tive’’ 


*The writer wishes to express his sincere appreciation to the many friends and associates who aided him 
in this study. The help of the following persons is particularly acknowledged: Dr. Joseph Bledsoe, 
the writer's major professor; Dr. John A. Dotson, Dean, College of Education, and Dr. James E. Greene, 
Chairman of Graduate Studies, University of Georgia, for their guidance and supervision; Dr. Doyne 
Smith and other members of the Kellogg Staff, University of Georgia, for their encouragement; and to 
the superintendents, supervisors, visiting teachers, and principals of the Georgia schools for their 


splendid cooperation. 
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and ‘‘least effective’ in terms of over-all 
school administrative abilities. 

. The analysis of individual test items so as to 
determine the discriminatory power of each 
item in regards to levels of rated effective- 
ness. 

. The determination of the validity and reliabil- 
ity of the attitude items which show discrim- 
inatory power in regards to levels of rated 
effectiveness. 

. The determination of certain personal and pro- 
fessional characteristics of principals and 
their relationship to levels of rated effective- 
ness in terms of over-all school administra- 
tive abilities. 


SECTION II 
METHODS OF PROCEDURE 


Procedures for Constructing Test Items 





THE FORMAT of the test decided upon was 
similar to the one devised by Likert as reported 
by Cronbach. 1* This technique consists of con- 
structing a series of statements to which the sub- 
ject can respond on a five-point scale of agree- 
ment. 

The source of the statements was the Compos- 
ite List of Critical Requirements2 which grew 
out of six doctoral studies of the principal ship 
at the University of Georgia. A sample page of 
this Composite List is found in Appendix A (or- 
iginal manuscript). Each of the 198 critical re- 
quirements in the Composite List was traced 
back to the original data in order tomatch that 
requirement with the critical behaviors. A crit- 
ical behavior that appeared to be typical of the 
behaviors identified under that composite re- 
quirement was selected as the basis for atest 
item. The behavior was expressed in terms of 
a statement to which responses could be made 
on a four-point scale. This scale consisted of 
‘*strongly agree’’, ‘“‘mildly agree’’, ‘‘mildly dis- 
agree’’, and ‘‘strongly disagree’’. ‘‘Undecided” 
which was included in Likert’s3 original scale, 
was omitted becaus:: ‘he author felt that princi- 
pals should be aske« either to agree or disagre: 
with each statement. Statements for eachof the 
198 composite requirements were constructed. 

The next step in the construction of the initial 
instrument was the editing of the statements. 
Three members of the College of Education staff 
were each asked to edit one-third of the state- 
ments. The statements were edited in terms of 
structure, ambiguity, and preciseness. 

The 198 statements were then compiled into 
a provisional attitude scale. This scale was ad- 


*A11 footnotes will be found at end of article. 





ministered to five staff members in E ducational 
Administration, University of Georgia. These 
staff members were asked to give suggestions 
for improving the instrument. Revisions were 
then made in light of these suggestions. The first 
page of the instrument, as thus revised, is shown 
in Appendix B (original manuscript). 


Procedures for Selecting Respondent Groups 





The respondent groups were selected from 
school systems in the State of Georgia employing 
seven or more white principals. The groups were 
further limited to those systems whose respon- 
sible officials agreed to participate in the pro- 
ject. The seventy-one school systems in Georgia, 
which had seven or more principals during the 
school year 1956-57, were requested to cooper- 
ate. Of this number, thirty-nine systems agreed 
to participate by rating their principals. This 
rating was necessary in order to select two dis- 
tinct groups of principals in terms of rated effec- 
tiveness in over-all school administration. 

Two ratings were obtained on each principal 
in the thirty-nine systems. The ratings were 
made by superintendents of schools and supervi- 
sors. In systems not employing a supervisor the 
visiting teacher made the rating. Alistofnames 
of principals in each participating county was 
mailed to each rater. The names were taken 
from the Georgia Educational Directory, 1956- 
57.4 The list of names was accompanied by a de- 
tachable numbers slip. The numbers on this slip 
corresponded to the numbers placed by the names 
of the principals on the list. The rater was asked 
to rate the principals in terms of effectiveness 
in over-all school administration in the following 
manner: (1) by placing plus marks on the num- 
bers slip beside one-third of the principals in 
that system who were believed to be ‘‘most effec- 
tive’’, and (2) by placing minus marks on the num- 
bers slip beside the one-third of that system’s 
principals who were believed to be ‘‘least ef fec- 
tive’’. The actual number of principals to be rat- 
ed was predetermined by the author. 

after these ratings were made only the num- 
bers slip was returned. Numbers were used 
rather than names because the author felt that 
greater cooperation and objectivity could be ob- 
tained by having principals rated by numbers ra- 
ther than by names. 

A total of 540 principals in the thirty-nine school 
systems was rated by both their superintendents 
and supervisors (or visiting teachers). Of the num- 
ber rated, 137 principals were placed in the ‘‘most 
effective’’ group and 136 principals were placed in 
the ‘‘least effective’’ group by both raters. These 
273 principals were usedas respondents for the 
collection of data. 





Procedures for Collecting Test Data 





Data obtained from the principals were of two 
types. The first type was the completed atti- 
tude instrument. The second type consisted of 
certain professional and personal character is- 
tics of the respondents as follows: age of re- 
spondent, sex of respondent, experience of re- 
spondent as a teacher, experience of respondent 
as a principal, experience of respondent outside 
the field of education, the number of positions 
the respondent has held within the past ten years, 
and type of school of which the respondent is now 
principal. This information was obtained by hav- 
ing the principals check appropriate blanks on a 
form attached to the attitude survey instrument. 

The attitude survey form and the check-list 
were mailed to each of the 540 principals in the 
thirty-nine school systems. This action was 
taken in order to allay any possibie suspicion on 
the part of the principals that they had been rat- 
ed. At the end of two weeks, follow-up letters 
were mailed to all respondents who had not re- 
turned their completed forms. At the end of 
four weeks additional letters were mailed to 
those respondents in the “‘least effective’’ group 
who had not responded. This second follow-up 
letter was necessary since the ‘‘least effective’’ 
group was slower in returning its instruments. 

A total of 228 usable survey instruments was 
received. The ‘‘most effective’ group returned 
113 completed forms, while the ‘‘least effective” 
group returned 115 forms. 

The combined respondent group returned 8 4 
percent of its instruments. 


SECTION II 


THE SELECTION OF TEST ITEMS AND THE 
EVALUATION OF THE TEST 


Selection of the Item Validating Group and 
the Test Validating Group 








COMPLETED ATTITUDE instruments were 
received from 228 principals in the total respond- 
ent group. Of this total, 113 were rated ‘‘most 
effective’ and 115 were rated ‘‘least effective’’ 
by superintendents and supervisors (or visiting 
teachers). Because of the importance of item 
validation in testing the original hypothesis, the 
writer decided to use a total of 160 respondents 
in the item validating group (eighty representing 
each level of rated effectiveness). The remain- 
ing instruments were used for test validation (33 
in the ‘‘most effective’’ and 35 in the “‘least ef- 
fective’ group). The instruments were grouped 
in that manner in order to accomplish an ade- 
quate validation of both test and items. 

The completed instruments were grouped by 
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two junior staff members of the College of Educa- 
tion. One staff member selected the following 
series: 1,4,7, etc., in the ‘“‘most effective’’ 
group. The other one selected 3,6,9, etc., in 
the group rated ‘‘least effective.’’ This process 
was followed until eighty instruments had been se- 
lected from each group. By this method of selec- 
tion each completed instrument had approximate- 
ly two chances out of three of being selected for 
the item validating group, and approximately one 
chance in three of being selected for the test val- 
idating group. 


Selection and Weighting of Discriminating 
Test Items 





The replies to the statements on the instru- 
ments in the item validating group were coded 
and transferred to Hollerith cards in order to fa- 
cilitate the handling of the data. An experienced 
card-punch operator was employed to transfer the 
answers to the Hollerith cards. 

Responses to each of the 198 items were tabu- 
lated and transferred to worksheets for statis- 
tical analysis. A chi-square test of contingency 
was performed for each of the items. For the 
purposes of this study, items discriminating be- 
yond the .10 level of significance were selected 
for further use and were designated as significi- 
cant items. Contingency coefficients were com- 
puted for all significant items. This coefficient 
carries the same level of significance as the chi- 
square. Table I lists the chi-square values and 
the contingency coefficients for the thirty items 
found to be significant. The degrees of freedom 
changed from ‘‘3’’ to ‘‘2’’ on eight of the items 
as certain of the possible responses attracted 
few or no answers. Responses receiving only a 
few answers were counted as belonging to the ad- 
jacent responses which in all cases were ‘‘mildly 
agree’’ or ‘“‘mildly disagree’’. The only respon- 
ses in these items receiving so little attention 
were either ‘‘strongly agree’’ or ‘‘strongly dis- 
agree’’. 

The thirty items that were found to be at the 
-10 level of significance or below are listed in 
Table 1. 

All significant items were wee according 
to a formula devised by Guilford.*° The formula 
is: 


w- ee +4 


where 
pu is the proportion of the ‘‘most effective’’ 
group responding in a specified way; 
pl is the proportion of the ‘‘least effective’’ 
group responding in the same way; 
p is the proportion of the combined groups 
according to the formula 
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TABLE I 


DISCRIMINATORY VALUE OF SIGNIFICANT TEST ITEMS 





Contingency 
Coefficient Chi-square 


= 





. 219 
. 212 
. 198 
. 214 
. 219 
.173 
. 213 
. 192 
. 200 
. 187 
. 317 
. 235 
. 249 
. 199 
-176 
- 187 
. 204 
- 212 
. 215 
. 236 
. 236 
. 227 
. 187 
.175 
. 207 
. 198 
. 226 
.179 
. 222 


. 08 
52 
56 


_ 


8 

7. 
6. 
7. 
8. 
4. 
7. 
6. 
6. 
5. 
7. 
9. 
0. 
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TABLE ll 


NUMBER ON ORIGINAL ATTITUDE SURVEY FORM AND STATEMENT OF THIRTY 
ITEMS FOUND TO DISCRIMINATE BETWEEN “‘LEAST EFFEC TIVE”? AND 
‘“‘MOST EFFECTIVE’”’ PRINCIPALS 








Statement 





A good principal grants all reasonable requests of teachers and students. 

A good principal will remain impartial in elections involving his superiors. 
Exceptional students should be allowed extra privileges not ordinarily granted to 
other childrer 

Parents interested in their children will visit the school without encouragement 
from the principal. 

Controversial school bond issues should have the principals’ support. 

Students should be provided with the opportunity to deveiop moral and spiritual 
values at school. 

Punishment must sometimes be administered before the cause of a misbehavior 
can be investigated. 

Substitute teaching is one of the duties of the principalship. 

Good grades must be required.as a prerequisite for student participation in ath- 
letic events. 

Creditable performance should always receive special recognition. 

The school’s instructional program must be geared to meeting the needs of the 
average student. 

A principal should be the group leader in school meetings and conferences. 
There is a tendency to neglect the ‘“‘three R’s’’ in the instructional program. 

A good principal must provide a professional library for his teachers. 

An alumni group may sometimes be permitted to supervise the schocl athletic 
program. 

A principal often finds it necessary to be late for appointments. 

Young people should be persuaded to enter the teaching profession. 

Handicapped children should be placed in the classroom with normal children. 
Accreditation standards of the state are not important to the school’s instruction- 
al program. 

The school must have an orientation program for parents of pre-school age 
children. 

A good principal will consider religious beliefs of students in establishing school 
policy. 

A good principal may have difficulty in making school regulations properly under- 
stood. 

Out-of-school recreational activities should be included as a part of the instruc- 
tional program. 

A regulation that teachers oppose is not a good regulation. 

A good principal must maintain a reserved relationship with students. 

The school board should establish the policies of the school. 

Students should be persuaded to participate in a wide range of activities. 

New teachers should be given close supervision during their first year of teaching. 
A principal will carefully plan all school activities. 

A principal should not encourage outside evaluation of his school. 








JOURNAL OF EXPERIMENTAL EDUCATION 


p = Appl 


q isl-p 


The weights of the items were then rounded 
off to the nearest .5. These weights were mul- 
tiplied by two in order to eliminate the decimal. 
After this operation it was found the ‘‘2’’ could 
be subtracted from each weight without giving it 
a negative value. This procedure facilitated the 
computation of test scores. The weights for the 
significant items are listed in Table II]. These 
items are listed in random order so as to con- 
ceal their identity. The weights ranged from 
**0’’ to ‘*8’’ points. The lowest possible score 
for the thirty-item instrument is 51, while 167 
is the highest pessible score. 

After each response for the significant items 
had been weighted a scoring key was constructed. 
All papers for the entire 228 respondents were 
scored and numerical grades were given them. 
The ‘‘lowest actual’’ score made was 83, while 
140 was the ‘‘highest actual’’ score. 


The Validity of the Test 





Validity is the crucial problem inany attempt 
to construct measuring devices. In this project 
the problem can be stated in terms ofa question: 
How well does this test discriminate between the 
‘‘most effective’’ and the “‘least effective’’ prin- 
cipals? The researcher, in selecting the items 
for the test, selected only those items that, ac- 
cording to the chi-square test of contingency, 
discriminated between the two levels of rated ef- 
fectiveness. This method of selecting items in- 
sured a certain amount of validity for the instru- 
ment. 

The method used to compute the validity coef- 
ficient of the test was the point bi-serial corre- 
lation formula. The point bi-serial r was trans- 
formed into the bi-serial r by use of the follow- 
ing conversion formula listed by Guilford:§ 

i 


l'b = Tpbi y 


is the bi-serial coefficient of correla- 
tion; 

is the point bi-serial coefficient of cor- 
relation; 

is the proportion of cases in the ‘‘most 
effective’’ group 

is the proportion of cases in the “‘least 
effective’’ group; 

is the ordinate of the normal dist ribu- 
tion curve with surface equal to 1.00 at 
the point of division between segments 
containing p andq proportions of 





the cases. 


Although two levels of rated effectiveness had 
been obtained, it was felt that this conversion 
could be made for the reason that rated effective- 
ness is continuous. Within each group of princi- 
pals many levels of rated effectiveness in over- 
all school administration were represented. 

The validity coefficient was computed by con- 
version from the point bi-serial coefficient 
rather than directly by the bi-serial coefficient 
formula so as to eliminate the necessity to score 
the large number of instruments from the princi- 
pals who were not rated in either the most or 
least effective groups. The scores which were 
considered in the point bi-serial formula repre- 
sented the scores of principals from the two ex- 
tremes of the continuum of rated effectiveness. 

Two coefficients of validity were computed by 
the method given above. The coefficients are 
given in Table IV. The first coefficient of valid- 
ity is the correlation between the ‘‘most effec- 
tive’’ and the “‘least effective’’ groups used for 
item validation. A validity coefficient of .85 was 
obtained from these two groups. This correla- 
tion is, of course, inflated because it was ob- 
tained from the same sample used to compute the 
item validity. This coefficient was obtained in 
order to determine the amount of spurious cor- 
relation which was involved when a validity coef- 
ficient is computed from the same population that 
is used for item analysis. 

A bi-serial correlation for the test validating 
group was computed. This validity coefficient 
was .71. In comparing the two coefficients it is 
obvious that the coefficient of .85 is somewhat 
inflated. 

As the validating criterion (the ratings of the 
principals) was not completely reliable, the re- 
searcher felt that a correction for attenuation 
should be made. In order to make this correc- 
tion, it was necessary to obtain the reliability co- 
efficient of the criterion, that is, to compute the 
correlation between the ratings given by superin- 
tendents and those given by supervisors (or visit- 
ing teachers) for all of the counties where s epa- 
rated ratings were obtained. This correlation 
was found to be .83 by the Pearson r product 
moment method. The validity of the test after 
the correction for attenuation was made was .78. 

The validity coefficient obtained from the test 
validating group is fairly high. It suggests that 
the thirty-item test has some real utility in eval- 
uating the rated effectiveness of the public school 
principal. 

A further attempt to check the validity of the 
test was made. This attempt involved a compar- 
ison in the means of the groups of principals ra- 
ted ‘‘most effective’’, with the means of the 
groups rated ‘‘least effective’’. Table V gives 
the mean scores and the standard deviations of 
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Item Validation (N = 160) 
Test Validation (N = 
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TABLE V 


MEANS AND STANDARD DEVIATIONS OF SCORES ON THE ITEM ANALYSIS AND 
TEST VALIDATING GROUPS 








Group 





Total 
Most Effective 
Least Effective 


Item Validation 
Most Effective 
Least Effective 


Test Validation 
Most Effective 
Least Effective 





the scores. Ina normal distribution two-thirds 
of the scores fall within plus and minus one 

standard deviation from the mean. By actual 
count approximately 70 percent of the scores in 
the ‘‘most effective’? group were between 117 

and 132, and in the ‘‘least effective’ group 67 
percent were between 102 and 119. The over- 
lap of two points is quite small. Table Vigives 





specialists in public school administration. 
These specialists have had close contact with 
principals in the field, although five of them have 
had no actual experience in the principalship. 
This comparison is given in Table VII. Thecrit- 
ical ratio is significant far beyond the one per- 
cent level of significance. 

Three different procedures were used incom- 


TABLE VI 


RELIABILITY OF DIFFERENCE IN THE MEANS OF THE ‘‘MOST EFFECTIVE’’ AND 
‘*‘LEAST EFFECTIVE”’ GROUPS 











Group 


CR P. Vaiue 





Validity Group 
Item Validity Group 
Total Group 


<.01 
11. 33 <.01 
<.01 





the reliability of the difference in the means in 
the respective groups. An inspection of the crit- 
ical ratios in Table VI shows that the ‘‘most ef- 
fective’’ and the ‘‘least effective’’ groups differ 
in all cases well beyond the one percent level of 
significance. 

A final attempt to check the validity of the 
test consisted of a comparison of the mean score 
of the total group of principals rated ‘‘least e f- 
fective’’ with the mean score of a group of ten 
College of Education staff members who are 





puting the validity of this attitude instrument: 
(1) the bi-serial r between the scores of the 
‘‘most effective’’ and ‘‘least effective’’ princi- 
pals, (2) the difference in the means of the “most 
effective’’ and ‘‘least effective’’ principals, and 
(3) the difference in the means of the scores of 
‘least effective’’ principals and a group of spec- 
ialists in educational administration. Each meth- 
od suggests that the instrument has sufficient 
validity for it to be used in screening applicants 
for principals’ positions. 





TABLE Vil 


RELIABILITY OF DIFFERENCE IN THE MEANS OF THE “LEAST EFFECTIVE” 
GROUP AND THE SPECIALISTS 





Group N Mean 


S. D. CR P. Value 





Least Effective 


Specialists 


110. 65 


120. 50 


8. 66 
5. 97 <.01 
4.32 





Reliability of the Test 





Several attempts to estimate the reliability of 
the test were made. One attempt used the Pear- 
son r formula applied to data computed by the 
odd-even split-half method of dividing the test. 
A correlation of .46 was obtained by this formu- 
la. This correlation was raised by the following 
Spearman-Brown formula as given by Garrett: 


r * nr 
mn ~ 1+(m-i1)r,, 


The raised reliability of .63 was obtained for 
the attitude survey instrument by this method ap- 
plied to the entire respondent group. 

The standard error of an obtained score is 
often used to express the reliability of a test. 
The standard error of measurement was comput- 
ed for the entire group of principals rated ‘‘most 
effective’’ and those rated ‘‘least effective’ (N 
is 228) and was found to be ‘‘6’’ score points. 


SECTION IV 


ANALYSIS OF THE RELATION OF RATINGSOF 
EFFECTIVENESS TO CERTAIN PROFES- 
SIONAL AND PERSONAL CHARACTER- 
ISTICS OF PRINCIPALS 


A SECOND objective of this project was to 
test the hypothesis that rated effectiveness is re- 
lated to certain personal and professional char- 
acteristics of the principal. These characteris- 
tics are: age of respondent, sex of respondent, 
type of school of which respondent is principal, 
teaching experience of respondent, administra- 
tive experience of respondent, non-school exper- 
ience of respondent, and the number of positions 
respondent has held within the past ten years. 
Null hypotheses were set up for each of these 
variables. Chi-square tests of contingency 
were computed between these variables and the 
principals’ rated effectiveness to test these hy- 
potheses. 





The following null hypotheses were tested: 


. There is no relationship between age of the 
principal and his rated effectiveness. 

. There is no relationship between sex of the 
principal and his/her rated effectiveness. 

. There is no relationship between the type of 
school in which the principal serves and his 
rated effectiveness. 

. There is no relationship between the length 
of teaching experience of the principal andhis 
rated effectiveness. 

. There is no relationship between the length 
of the principal’s experience as a principal 
and his rated effectiveness. 

. There is no relationship between the length 
of the principal’s experience outside the field 
of education and his rated effectiveness. 

. There is no relationship between the number 
of positions the principal has held within the 
past ten years and his rated effectiveness. 


Table VIII gives a summary of the findings. 

The first null hypothesis was accepted since 
the chi-square test showed no significant rela- 
tionship between the age of the principal and his 
rated effectiveness. 

The second null hypothesis was accepted since 
the observed relationship between the sex of the 
principal and his/her rated effectiveness did not 
exceed chance expectancy. 

The third null hypothesis was rejected since 
the relationship was significant beyond the .01 
level. There is a significant relationship be- 
tween the type of school the respondent has 
charge of and his rated effectiveness. 

The fourth null hypothesis was accepted since 
the relationship between length of teaching exper- 
ience and the principal’s rated effectiveness was 
not significant. 

The fifth null hypothesis was tentatively re - 
jected since the relationship was significant be- 
tween the .02 and .05 levels. The principals 
rated as ‘‘least effective’’ had less experience 
in public school administration than did those 
who were rated as ‘‘most effective’’. 

The sixth null hypothesis was tentatively re- 
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TABLE Vil 


SUMMARY OF RELATIONSHIPS BETWEEN LEVELS OF RATED EFFECTIVE- 
NESS AND CERTAIN PERSONAL AND PROFESSIONAL CHARACTERISTICS 





Variable 


Chi-square df P. Value 





Sex of Respondent 

Age of Respondent 

Type of School 

Length of Teaching Experience 


Length of Administrative Experience 


Length of Non-school Experience 
Number of Positions Held Within 
Past Ten Years 


2.00 -10 - .20 
6. 90 -10 - .20 
16. 66 -01 

2.16 .50 - .70 
9. 26 .02 - .05 
9.18 -02 - .05 


5.12 .20 - .30 





jected since the relationship was significant be- 
tween the .02 and .05 levels. The relationship 
was an inverse one. The ‘‘most effective’’ prin- 
cipals tended to have fewer years of experience 
outside the field of education than did the ‘‘least 
effective’’ ones. 

The seventh null hypothesis was accepted 
since the relationship between the number of po- 
sitions the respondent has held within the past 
ten years and his rated effectiveness was not sig- 
nificant. 

The analysis of the relationships between lev- 
-els of rated effectiveness and certain personal 
and professional characteristics of principals in- 
dicated that rated effectiveness of principals is 
related to the type of schools in which they are 
employed. Also, rated effectiveness of princi- 
pals seems to be related to the length of experi- 
ence that they have as principals. Finally, rated 
effectiveness seems to be inversely related to 
the length of experience that principals have out- 
side the field of education. 


SECTION V 


SUMMARY, MAJOR FINDINGS, AND 
RECOMMENDATIONS 


THIS STUDY is one facet of the research be- 
ing carried on through the efforts of the C oop- 
erative Program in Educational Adminis- 
tration and the Kellogg staff of the Univer- 
sity of Georgia. The major purpose ofthe 
Cooperative Program assume that better 
communities result from an improvement 
of public education at the local level. 





Summary of Procedures 





This study had two major purposes. The first 
purpose was to attempt to construct a measuring 
device which would discriminate between ‘‘most 
effective’’ and ‘‘least effective’’ principals by at- 
tempting to isolate the characteristic attitudes of 
principals belonging to two distinct groups in 
terms of rated effectiveness. The second pur- 
pose was to attempt to relate certain personal 
and professional characteristics of principals to 
levels of rated effectiveness in overall school ad- 
ministration. 

The first step in carrying out the objectives of 
the study was the development of statements 
which would reflect principals’ attitudes toward 
duties and responsibilities related to the princi- 
palship. The Composite List of Critical Require- 
menis was used as the source in constructing the 
statements. A total of 198 statements, one for 
each of the critical requirements, was developed. 
The statements were edited by College of Educa- 
tion staff members. A tentative attitude scale 
was developed using each statement as the basis 
for a test item. 

The second major step in carrying out the ob- 
jectives of the study was the selection of the re- 
spondent groups. Two distinct groups of princi- 
pals in terms of rated effectiveness in overall 
school administration were selected. This was 
accomplished by superintendents and supervisors 
(or visiting teachers) who selected one-third of 
their ‘‘most effective’’ principals, and one-third 
of their ‘‘least effective’’ principals. This selec- 
tion was made in thirty-nine school systems. A 
total of 137 principals was placed in the ‘‘most 
effective’”’ group by both raters, while a total 








of 136 principals was placed in the ‘‘least effec- 
tive’’ group. These principals represented the 
respondent group. 

The tentative attitude scale accompanied bya 
cover letter requesting certain control informa- 
tion pertaining to the personal and professional 
characteristics of the principals was mailed to 
all respondents. A total of 228 usable attitude 
scales was returned by the respondents. 

In order to validate both the test items and 
the test, two groups of principals from each lev- 
el of rated effectiveness were selected. This se- 
lection was done on the basis of randomness. 
Eighty instruments representing each level of 
rated effectiveness were selected for the pur- 
pose of item validation. The remaining respon- 
dents (thirty-three from the ‘‘most effective’’ 
level and thirty-five from the ‘‘least effective’’ 
level) were used to validate the test. 

The individual test responses of the 160 prin- 
cipals in the item validating group were trans- 
ferred to Hollerith cards in order to facilitate 
the handling of data. 

An item analysis was made for each of the 
198 items in the instrument in order to select 
those items which discriminated significantly be- 
tween the responses of principals in the ‘‘most 
effective’’ group versus those in the ‘‘least ef- 
fective’’ group. The chi-square test of contin- 
gency was used for this item analysis. 

All discriminating items were weighted by an 
appropriate formula. A scoring key was con- 
structed for the thirty items included in the final 
instrument. All instruments for the 228 respon- 
dents were scored and given a numerical grade. 

Coefficients of validity were computed for the 
thirty-item test. 

Coefficients of reliability were computed for 
the thirty-item test. 

The data pertaining to the personal and pro- 
fessional characteristics of principals were an- 
alyzed in order to determine significance. The 


chi-square test of contingency was used in this 
procedure. 


Major Findings 


The findings of this study are in the areas of 
the two major objectives. The first objective 
was to test the hypothesis that the professional 
attitudes of principals are significantly related 
to their rated effectiveness in performing the 
duties of the principalship. In relationto this 
objective the following findings are indicated: 


1. Thirty self-reported attitudes of principals 
were found to be related to their rated effec- 
tiveness in performing the duties of the prin- 
cipalship. 

. These isolated attitudes served as the basis 
for constructing a test which is considered to 
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have value in discriminating between ‘‘most 
effective’’ and ‘‘least effective’’ principals. 

. The reliability and validity of the test so con- 
structed is considered to be sufficiently high 
to insure its value to superintendents of 
schools for use in screening applicants for 
principals’ positions. 


The second objective of this study was an at- 
tempt to relate certain personal and professional 
characteristics of principals to levels of rated 
effectiveness in overall school administration. In 
relation to this objective the following findings 
were obtained: 


1. The sex of the principal is not significantly re- 
lated to rated effectiveness. 

2. The age of the principal is not significantly re- 
lated to rated effectiveness. 

. The type of school the principal has charge of 
is significantly related to rated effectiveness. 
This relationship is significant beyond the .01 
level of confidence. 

. Length of teaching experience is not signifi- 
cantly related to rated effectiveness of the 
principal. 

. Length of experience as a principal tends to 
be related significantly to the principal’s rated 
effectiveness. The ‘‘most effective’’ princi- 
pals have more years of experience than do 
the ‘‘least effective’’ ones. This relationship 
is significant beyond the .05 level of confi- 
dence. 

. Length of work experience outside the field of 
education tends to be significantly related to 
the principal’s rated effectiveness. The effect 
is an inverse one since the ‘“‘most effective’’ 
principals have fewer years of experience out- 
side education than do the ‘‘least effective’’ 
ones. This inverse relationship is significant 
beyond the .05 level of confidence. 

. The number of positions held within the last 
ten years is not related to the principal’s 
rated effectiveness. 


Recommendations 





Attitude scales of the type developed in this 
project seem to have good possibilities for use 
in the selection of principals. The lack of meth- 
ods of demonstrated validity for screening appli- 
cants for the principalship indicates the need for 
a scale of this type. Therefore, the following 
recommendations seem to deserve attention: 


1. This scale should be increased in length using 
methods similar to the ones employed in this 
study. As the final scale consists of only 
thirty items, additional valid items should 
increase its validity and reliability. After 
the test has been increased in length, new 
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investigations should be made to further im- 
prove the reliability and validity of the test. 
. An investigation should be made into the pos- 
sibility of grouping similar attitude scale 
items into broad related areas of the princi- 
palship in order to develop a profile sheet for 
showing graphically possible strengths and 
weaknesses of applicants for principals’ posi- 
tions. 
. The validity of this instrument should be in- 
vestigated by researchers in other localities 
using different rating publics. 
. An evaluation of the predictive value of this 
scale should be made. This evaluation should 
be a long-range one involving several size- 
able school systems. 
. Additional research may be needed to clarify 
the results of this study concerning the rela- 
tionship of the rated effectiveness of the prin- 
cipal and the following data: 
a. The type of school of which he is princi- 
pal. 
b. The length of experience he has as a 
principal. 
c. The length of experience he has outside 
the field of education. 
. Although this instrument is avowedly in an 
area extremely difficult to measure, a satis- 
factory, usuable, valid measuring device 
should find widespread use and application. 
Therefore, if after additional research, this 
instrument is proven to have value in predict- 
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ing success as a principal, it should be pub- 
lished and/or otherwise made available for 
general professional use. 
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A NATIONWIDE SAMPLE OF GIRLS 
FROM SCHOOL LISTS 


JANE WILLIAMS BERGSTEN 
University of Michigan 


1. Introduction 





THIS PAPER describes the procedures used 
in selecting a nation-wide sample of girls from 
school lists. 1* The population to be covered was 
all girls in the United States in school grades 6 
through 12. Because of the nature of the study, 
it was desirable to exclude from our population 
all girls attending special schools suchas schools 
for mentally or physically handicapped or correc- 
tional institutions; therefore, girls attending 
these types of schools were not included in the 
sample. 

A sample of about 1800 girls was desired, 
and each girl in the defined population was to 
have an equal chance of being included in the 
sample. This is a probability sample whichper- 
mits us to generalize our findings with statis ti- 
cal validity to all girls in the defined population. 

In drawing the sample a technique known as 
multi-stage probability sampling was used. First, 
a sample of primary sampling areas was select- 
ed. Each primary sampling area consisted of a 
county or group of counties. Then, within each 
selected primary sampling area, a list ofall 
schools in the area was obtained; from these 
lists the sample of schools was selected. With- 
in each selected school, a list of all classes in 
grades 6 through 12 was obtained, and from 
these lists a selection of classes was made. With- 
in each selected class a list of all girls in the 
class was obtained, and from these lists, girls 
to be interviewed for the study were selected. 

In selecting a sample, it is desirable to bal- 
ance two conflicting factors, economy and preci- 
sion. The more wide-spread the sample, the 
greater the precision but also the greater the 
cost. Conversely, the more highly clustered the 
sample, the less the precision but the smaller 
the cost. In designing a sample, the aim is to 
obtain the most precision for the least amount of 
money. Keeping this aim in mind, it was decid- 
ed to select the sample in such a way that it 
would yield on the average about three or four 
schools per primary sampling area, about two 


*Al1 footnotes will be found at end of article. 





classes per school and about four girlsper class. 
The interviewing would thus be concentrated in 
several spots in a primary sampling area, which 
would keep the cost low and yet wouldbe spread 
over several different neighborhoods and grades, 
thus keeping precision high. Use will be made of 
these decisions in the following sections. 

The procedures used in selecting the sample 
at each of the several stages will be describedin 
sections 2 through 8. The combination of the var- 
ious rates of selection into an overall equal prob- 
ability of selection for all girls in the United 
States is postponed until section 9. 


2. Selection of the Primary Sampling Areas 





The first stage in selecting the sample con- 
sisted of the selection of the Survey Research 
Center’s basic sample of primary sampling 
areas. Each of the primary sampling areas con- 
sisted of a county or group of counties. This was 
deemed a convenient size area for an interview- 
er to cover, since she could travel anywhere 
within her county, take some interviews, and re- 
turn home, all in the same day. Thus the county 
was a convenient sampling unit from the point of 
view of economy. Within acounty, the population 
is relatively heterogeneous, since a county usu- 
ally consists of a city or large town, several 
small towns, and rural areas. “his heterogen- 
eity makes the county a convenient unit from the 
point of view of the precision of the survey. 

The basic procedure for the selection of the 
primary sampling areas was first to divide allof 
the counties in the entire country into 66 strata, 
and to select one primary sampling area from 
each stratum. Each of the 12 largest metropoli- 
tan areas in the country formed a stratum by it- 
self and was selected with certainty (with a prob- 
ability of 1) so as to represent itself. Each of 
the remaining 54 strata contained counties that 
were similar to one another with respect to var- 
ious economic, political, geographic and demo- 
graphic characteristics. For each of these 54 
strata, one primary sampling area was selected 
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from among all those in the stratum to represent 
the entire stratum. The probability of selection 
assigned to each primary sampling area was pro- 
portional to the number of people in that area. 
For example, if there were 2,000,000 people in 
a particular stratum and 100, 000 people in a giv- 
en primary sampling area in that stratum, the 
probability of selecting that primary sampling 
area would be 


100, 000/2, 000, 000 = wo: 


These 66 primary sampling areas have been 
our basic sample of areas within which we have 
been working for several years. A staffof inter- 
viewers has been hired and maintained in each 
of the sample points to work on the various sur- 
veys done by the Survey Research Center. 

For most of our nationwide surveys we select 
a sample of dwelling units within the primary 
sampling areas.“ This method wasdeemed im- 
practical in sampling a population such as ado- 
lescent girls, since only a small proportion of 
dwelling units (less than 15%) contain a member 
of our population. We would have had to select 
some 13,000 or more dwelling units in the 
country in order to obtain the desired 1800 inter- 
views. The cost of locating the respondents 
would thus be great using this type of sampling 
procedure. Instead, we decided to select our 


sample of girls by first selecting a sample of 
schools. 


3. Selection of Schools Within Primary 
Sampling Areas 








Girls in grades 6 through 12 can be enrolled 
in either primary or secondary schools. A list 
of secondary schools in the country, Director 
of Secondary Schools, 1951-1 952,3 was Svail- 
able, but there was no list of primary schools. 
Furthermore, since the Directory was compiled 
in 1951-52, it does not include those secondary 
schools that were established since that date. 
Because of the nature of the data available, it 
was decided to form three basic strata for the 
selection of schools within each of our primary 
sampling areas. These three strata were: 





1. Those secondary schools that were 
listed in the Directory of Secondary 
Schools, 1951-1952. 








. Those secondary schools that were 
established since the Directory of 
Secondary Schools was compiled. 





3. Primary schools 


Each school in the country containing members 





of our population, with the exception ofa few old 
unaccredited private secondary schools, would 
fall into one of the above strata. Selection of 
schools was made separately for each of these 
three strata. 


4. Selection of Schools From the Directory 
of Secondary Schools, 1951-1952. 








The schools in the Directory are listed accord- 
ing to state, and within each state, alphabetic al- 
ly by the town in which they are located, or post 
office address. For each state withinwhicha 
primary sampling area is located, the entries in 
the Directory were checked against the Postal 
Guide to determine the counties in which the 
schools are located. All those schools located 
in primary sampling areas constituteda list from 
which schools would be selected. Thus each 
school in the Directory was given the same prob- 
ability of appearing on our ‘‘list’’ as the probabil- 
ity of selection of the county in which it is locat- 
ed. For each secondary school iocated in a se- 
lected primary sampling area a card was made 
up showing: 


. Primary sampling area identification 

. Identification of school 

. Type of school (e.g., ‘‘junior high school’’) 
. Race (in segragated school systems) 

. Whether public or private 

. Enrollment 


For each primary sampling area the cards 
were placed into groups (strata). These group- 
ings were made as follows: 


1. Private and parochial schools. These were 
further sub-divided into Catholic parochial 
and other private and parochial schools. 
The division was determined on the basis 
of the name of the school. 


. Public schools. These were further grouped. 
by race (where segregated), size of school, 
and type of school. 


On the basis of these groupings, the car ds' 
were ordered, and a systematic selection of 
schools was made. The probability of selection 
given to each school within the county was pro- 
portional to the number of students in the school. 
For example, let us assume that within a partic- 
ular primary sampling area there were a total 
of 5800 students in schools that were listed in the 
Directory, and that among these schools was 
School B with 1160 students. If we wished to se- 
lect two of these schools, then the probability 
that school B would have of being selected from 
among all such schools in the primary sampling 
area would be (2)(1160)/5800. 





BERGSTEN 199 


The probability of selection for any school in 
the country was thus dependent upon two prob- 
abilities: the probability that the county in which 
the school is located had of being selected, and 
the probability that the school had of being se- 
lected from among all such schools in the county. 

The systematic selection of schools was made 
by first determining the interval to be used in 
each primary sampling area (in the above exam- 
ple, this interval would be 5800/2). A random 
number between 1 and the interval was picked, 
and then the interval was added successively to 
the random start yielding a series of selection 
numbers. These numbers determined which 
schools were selected into the sample. 

Keeping the cards in the above mentioned or- 
der, the numbers of students in the schools were 
cumulated. As soon as the cumulation spanned 
one of the selection numbers, that school was se- 
lected into the sample. (If in cumulating, the ad- 
dition of a school spanned two selection numbers, 
that school was considered to have been select- 
ed twice. ) 

In the example, we would pick a random num- 
ber between 1 and 5800/2 or 2900. Let us as- 
sume that the random number picked was 0982. 
This random number would determine the first 
school selection, and the random number plus 
the interval or 0982 plus 2900 = 3882 would de- 
termine the second school selection. The num- 
bers of students in the schools would ihen be 
cumulated, and the selections determined. For 
example: 


School No.Stud. Cum. Selection 
Identif. in School No.Stud. Numbers 


871 871 

® 1160 2031 0982 ~=RS 
981 3012 
781 3793 

©) 1040 4833 3882 RS+I 
967 5800 





Since the selection number 0982 was spanned 
by the range of numbers 872-2031, school B 
would be selected into the sample with a prob- 
ability of 1160. Similarly, school E would be 


selected with a probability of 1040. Using this 


procedure, 150 secondary schools were select- 
ed into the sample. 


5. Selection of Newly Established 
~~ Secondary Schools 








Since the Directory of Secondary Schools, 
1951-1952 contained only those schools that were 
in existence in 1951-52, a list of newly estab- 





lished secondary schools was constructed. In 

each of our primary sampling areas, the inter- 
viewer was asked to contact the various boards of 
education involved to obtain a list of all second- 
ary schools that had been established since 1950. 
Since all schools listed in the Directory had al- 
ready been given their chance of selection, the 

lists of new schools were compared with the Direc - 
tory and those schools that were included in the 


irectory, as well as on the list of new schools, 


were removed from the list. Thus, our list of 
new schools was corrected so that it included 
only new secondary schools that were not listed 
in the Directory of Secondary Schools. The inter- 
viewers were asked, also, to obtain an estimate 
of the number of students enrolled in each of these 
new secondary schools. Where this figure was 
not available, we estimated the approximate size 
of the school, taking into consideration the num- 
ber of grades the school included, the size of the 
area it serviced, and the size of other nearby 
schools. Schools from this list were selected in- 
to the sample by using the same procedure de- 
scribed above for selecting schools from the Di- 
rectory. A random number between 1 and the in- 
ter was picked, and the interval was succes- 
sively added to the random start to determine the 
selection numbers. The numbers of students in 
the schools were then cumulated. As soon as the 
cumulation spanned one of the selection numbers, 
that school was selected into the sample. By this 
procedure 10 secondary schools were selected in- 
to the sample. 


6. Selection of Primary Schools 





Since no list of primary schools in the country 
as a whole could be obtained, the interviewers in 
each of our primary sampling areas were asked 
to obtain a list of all primary schools in their 
area. They were asked to find out what grades 
each school contained, whether the school was 
public or private, and the total number of stu- 
dents in grades 6 and over, or if this information 
was not available, the total number of students in 
the school, from which the total number of stu- 
dents in grade 6 and over could be estimated. 

Primary schools were selected into the sam- 
ple by using the same procedure as was used for 
selecting schools from the Directory, andfor se- 
lecting new secondary schools. random num- 
ber was chosen, and the interval was added suc- 
cessively to it to determine the selection numbers. 
The estimated numbers of students in grade 6 
and over were cumulated, and the schools select- 
ed. Inall, 111 elementary schools were select- 
ed into the sample. 


7. Selection of Classes Within Selected Schools 








For each school that was selected into the 
sample, the interviewer was requested to con- 
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tact the principal and to obtain the following in- 
formation about the school: 


1. The grades that were included in the 
school 
2. A list of all homeroom classes in grades 
6 and above giving: 
a) the grade to which each class belonged 
b) the number of girls in each class, or, 
if that was not available, the total num- 
ber of students in each class. Home- 
room classes (rather than all classes) 
were to be listed so that each girl would 
be uniquely associated with one and only 
one class. 


Thus, each homeroom class was given the same 
probability of appearing on one of these lists as 
the probability of selection of the school in which 
it was located. From these lists, classes were 
selected with probability proportional to the esti- 
mated number of girls in each class. A random 
number between 1 and the interval for selecting 
classes was picked, and the interval was added 
successively to the random start to determine 
the selection numbers. The estimated numbers 
of girls in each class were cumulated, and when 
the cumulation spanned one of the selection num- 
bers, that class was selected into the sample. 

In the example, if we had selected school B, 
with 1160 students in the school, we would have 
expected that there would be 4 x 1160 or about 
580 girls in the school. Since we had dec ided 
that we should select about two classes per 
school, the interval for selecting classes in this 
school would be 580/2 or 290. A random num- 
ber between 1 and the interval, in this case 290, 
would be picked. Let us assume the random 
number was 042. This random number would 
determine the first class selection; the random 
number plus the interval, or 042 + 290 = 332, 
would determine the second class selection, etc. 
The number of girls in the classes would then 
be cumulated, and the selected classes deter- 
mined. For example: 


Class No. girls . Cum.No. Selec. 
Identification in class of girls Nos. 


grade 9 sec.,1 19 19 
grade 9 sec.,2 22 41 
(grade 9 sec.,3 20 61 42) 











grade 10 sec. 17 325 
@rade 11 sec. 21 346 
grade 11 sec. 15 361 











Since the selection number 42 was spanned by 
the range 42-61, the class ‘‘grade 9 section 3’’ 
would be selected into the sample with a probabil- 
ity of 20/290. Similarly, ‘‘grade 11 section 1’’ 
would be selected into the sample with a probabil- 
ity of 21/290. 

It should be noted here that we selected clas- 
ses at a certain rate andnot ona quota basis. This 
rate was related to the estimated number of stu- 
dents in the school, that is, to the number that 
was used in selecting the school, and not to the ac- 
tual number of students found in the school. To 
the extent that the actual number of students in 
the school differed from the estimated num ber, 
we selected more or fewer than the desired two 
classes. In our example, the interval of 290 to 
be applied in selecting classes in school B was de- 
termined by dividing the estimated number of stu- 
dents in the school by 2 to obtain the estimated 
number of girls, i.e., 1160/2 = 580. Then the 
estimated number of girls was divided by 2 to ob- 
tain the interval for selecting classes, i.e., 580 
/2 = 290. This second division by 2 was nec es- 
sary because the sample was designed with the 
aim of selecting about two classes per school. The 
interval of 290 was then applied to the cumulative 
list of classes. If the school had only 290 girls 
enro!led, then only one class would be selected. If 
the school had 870 girls enrolled, thenthree clas- 
ses would be selected, etc.4 


8. Selection of Girls Within Sample Classes 





For each class selected into the sample, the 
interviewer was sent a form containing the school 
and class identification. On this form she was 
requested to list the names of all girls in the des- 
ignated class, each girl to be listed ona separate 
line. When the listing was completed, she was to 
lift the tape at the bottom of the form; under this 
tape were written the line numbers that were to 
be included in the sample. The interviewer then 
checked each such designated line number; the 
girl whose name appeared on each checked line 
was thus designated to be interviewed. 

To determine the sample line numbers, it was 
first necessary to determine the interval for se- 
lecting girls within the sample class. This inter- 
val was obtained by dividing the estimated num- 
ber of girls in the class (i.e., the number that 
was used in selecting the class) by 4, since we 
had decided that we wished to select about four 
girls per sample class. After determining the 
interval, a random number between 1 and the in- 
terval was picked, and the interval was added 
successively to the random start. This produced 
a series of selection numbers which determined 
the line numbers which were to be included in 
the sample. For example, when ‘‘grade 9 sec- 
tion 3’’ was selected, the estimated number of 
girls for that class was 20. Therefore, to de- 
termine the interval for selecting girls in that 
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class, we would merely divide 20 by 4and obtain 
the interval 5.5 A random number between 1 
and 5 would then be selected. Let us assume 
that the random number was 1. The interval, 5, 
would be added successively to the random start, 
1, and the selection numbers would be deter - 
mined. Thus, lines 1, 6, 11, 16, 21, 26, etc., 
would be selected into the sample, andthe girls 
whose names had been listed on those lines would 
be designated for interviewing. 

It should be noted again that we selected girls 
within a class at a certain rate. If there were 
actually more or fewer girls in a class than we 
estimated, the result was the selection of more 
or fewer girls than the desired four girls per 
class. As in the example, if there actually 
were 25 girls in the class, where we had esti- 
mated only 20, then five girls would have been 
selected to be interviewed rather than four. 


9. Establishing Sampling Rates at the 
Various Stages of Selection 








The total number of girls in grades 6 through 
12 in school in the entire country was estimated 
to be between 6.8 million and 7.6 million. (Esti- 
mates were obtained from a number of sources, 
no one source feeling that their estimate was 
very accurate.) It was felt that the true figure 
probably was somewhere between 7.0 and 7.3 
million. On the basis of these figures, it was 
decided to set the basic sampling rate at 1/3625. 
That is, every girl in the defined population in 
the country was to have one chance in 3625 of be- 
ing selected into the sample. If the population 
were in fact about 7.0 million, and each girl had 
one chance in 3625 of being selected into the sam- 
ple, a total of about 1930 girls would be selected 
(7, 000, 000/3625 = 1930). This figure, 1930, 
seemed a reasonable one at which to aim, since 
there would undoubtedly be some loss, because 
girls had dropped out of school, or for some 
reason could not be interviewed (because of i11- 
ness, refusal, and so on). If this loss amounted 
to about 5%, then the total number of interviews 
that would be obtained would be approximately 
1930 x .95 = 1833, which was consistent with the 
aim of obtaining approximately 1800 interviews. 
(To the extent that the population was actually 
smaller or larger than 7.0 million, our sample 
would yield fewer or more than estimated. ) 

The sample was drawn in four stages: (1) a 
sample of primary sampling areas (counties) had 
been drawn; (2) within each primary sampling 
area schools were selected; (3) within selected 
schools, classes were selected; and (4) within 
each selected class, girls were selected for in- 
terviewing. At each stage of selection a tech- 
nique known as sampling with probability propor- 
tional to size was used. This procedure permits 
one to obtain equal size sample clusters from un- 
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equal size sampling units, and still maintain 
overall equal probability of selection of the final 
sampling units (girls). 

Specifically, sampling with probability propor- 
tional to size implies a procedure whereby the 
probability assigned to each sampling unit is di- 
rectly proportional to the size of the sampling un- 
it. Sub-sampling within selected sampling units, 
then, is based on probabilities inversely propor- 
tional to the size of the sampling unit. 6 

Using proba ility proportional to size in se- 
lecting sampling units at the various stages of se- 
lection permitted us some control over how the 
sampling units were to be distributed. Itwas de- 
cided that the girls selected into the sample should 
be distributed in the following manner: about four 
girls should be selected for interviewing from 
each selected class, andabout two classes should 
be selected from each selected school. This would 
mean that, on the average, we would expect about 
eight girls in the sample from each school that 
was selected. 

Since the 1800 interviews were to be distribu- 
ted over 66 primary sampling areas, it would 
mean that, on the average, about 25-30 inter - 
views would be taken in each primary sampling 
area. If these 25-30 interviews were to be taken 
in clusters of eight interviews per school, there 
would be on the average about three or four schools 
selected in each primary sampling area. 

These decisions on the manner in which the 
sample should be distributed took into considera- 
tion both the cost involved in carrying out the sur- 
vey and the precision of the estimates that would 
be made from the survey data. 

Now, on the basis of these decisions, the sam- 
pling rates at each stage of selection could be 
worked out. The setting of the rates was to abide 
by two basic restrictions: 


1. That the product of the various probabili- 
ties used at the different stages of selection 
be equal to 1/3625. That is, for each girl 
in our popuiation—(probability of selecting 
her primary sampling area) x (probability 
of selecting her school within her primary 
sampling area) x (probability of selecting 
her class within her school) x (probability 
of selecting the girl within her class) 
= 1/3625. 


2. That the probabilities used at the various 
stages of selection be set in sucha way that 
the desired clustering would result—that 
is, so that we would select about four girls 
per class and about two classes per school. 


Now, it was possible for us to write down a 
formula that would abide by these restrictions, 
and that would tell us specifically how to deter- 
mine each probability at each stage of selection, 
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for every primary sampling area, every school, 
every class, and every girl in our population. 

The first stage in developing this formula was 
to write down the basic description of what we 
wished to do. In this basic description we spec- 
ified that the sample would be drawn in four 
stages, and that every girl in the defined popula- 
tion would have an equal chance (1 chance in 3625) 
of being selected into the sample. 


(Ppsa) X (Ps) x (Pc) x (Pg) = 1/3625 


where: 

Pposa = probability of selecting a primary 
sampling area 

Ps = probability of selecting a school with- 
in a psa 

Pe = probability of selecting a class with- 
in a selected school 

Pg = probability of selecting a girl within 
a selected class 


Using this basic formula, we worked in the de- 
sired cluster aims. 

The 1/3625 was fixed, since we had decided 
that this probability would produce the desired 
size sample. 

The Ppsa was fixed for each primary sam - 
pling area, for the purposes of selecting this 
sample, since we wished to use the basic sam- 
pling areas of the Survey ResearchCenter. Each 
primary sampling area in the basic sample had 
had a determined probability of selection. This 
probability was equal to the number of people in 
the primary sampling area divided by the num- 
ber of people in the stratum from which it was 
selected. Let us call this probability Njx/Nx — 
where Njx was the number of people in ith pri- 
mary sampling area in the xth stratum, and Nx 
was the number of people in the entire xth stra- 
tum. (The subscript i indicates there could be 
more than one primary sampling area in a stra- 
tum, although only one was selected into the 
sample from each stratum. The subscript x in- 
dicates that there is more than one stratum. ) 

Having fixed these two probabilities, we could 
start setting the other terms in the equation. 

First, we worked with the probability of se- 
lecting girls within a class, Pg. We wished to 
select four girls out of the total number of girls 
in a selected class. The probability of selecting 
a girl within a selected class should therefore be 
4/Gijk, where Gigs was the eutttnnsed number 
of gitis in the class in the jth school in the 

ith primary sampling area. 

Having set this probability, we proceded to 
the next term, Pc, the probability of selecting 
a class in a selected school. Since it was de- 
sired to select about two classes per school, the 
probability of selecting a class was set at 2 x 
Gijk < 2/Sjj where Gjjx was the estimated num- 





ber of girls in the kth class in the jth school in 
the ith primary sampling area, and Sij was the 
estimated number of students in the jth school 
in the ith primary sampling area. The num era- 
tor in the term has two 2’s; the first specifies 
that we wished to select about two classes per 
school, and the other that we expected, on the 
average, there would be about twice as many stu- 
dents as there were girls in the school. In other 
words, our estimate of the total number of girls 
in the school would be §ij/2. (If we had selected 
a school containing all girls, then we would have 
expected to select four classes from that school.) 

The only term remaining to be defined was 
the probability of selecting schools. This term 
was residual; in other words, it was dependent 
upon values assigned to the other terms; it hadto 
take on that value which would make the equation 
hold true. The probability of selecting schools 
therefore had to be 


Sij 
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Putting these terms together, the formula that 
expressed the probability that each girl in our 
population had of being selected into the sample 
was: 
Sij 2X 2  Gijk 
x2x2Xx 4x 3625 Sj 
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The probabilities used in the example in sec- 
tions 2-8 can be written in the same form, show- 
ing how the selections were made at the different 
stages. 


1. The probability of selecting the primary sam- 
pling area in the example was 


Nix a. 
Nx or 100, 000/2,000, 000 = 50 


2. The probability of selecting school B was 
Sij 
Nix x 2x 2x 4 x 3625 
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3. The probability of selecting the class ‘‘grade 
9 séction 3’’ was 


2x 2x Gijk o 2x2x20 # #20 
Sij 1160 290 


4. The probability of selecting a girl in class 
“‘grade 9 section 3’’ was 4/Gijx or 4/20. 


Putting these all together, we obtain the follow- 
ing equation: 
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At each stage of selection, estimates were 
used as measures of size for selection. To the 
extent that these estimates or measures of size 
did not exactly correspond to the actual size of 
the sampling unit our sample was larger or 
smaller than expected. For example, a meas- 
ure of size, Sjj, was assigned to school j in the 
ith primary sampling area. If school j was se- 
lected, the probability of selecting classes with- 
in the school was already determined. No matter 
how large school j actually was, oneclass would 
be selected from it for every Sjj/4 students in 
the school. If school j contained exactly Sjj stu- 
dents, and exactly half of these students were 
girls, we would select exactly two classes from 
the school. To the extent that the school con- 
tained more or fewer than Sjj students or that 
more or fewer than half of the students were 
girls, then more or fewer than the expected two 
classes would be selected. Thus, inaccurate 
measures of size were ‘‘corrected’’ for by the 
procedure used. This ‘‘correction’’ occurredat 
each stage of the sample selection. The proba- 
bilities were maintained, in spite of discrep- 
ancies between the measures of size used in se- 
lection, and the actual size of the units. The dis- 
crepancies were reflected in the selection of 
more or fewer units than desired. Although in- 
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accuracies in measures of size in no way affect- 
ed the ultimate probability of selection of girls, 
it was nevertheless important to use relatively 
a. -\.rate measures of size in order to obtain the 
desired clustering. 

The table shown on the next page gives the 
number of schools, classes and girls that were 
selected into the sample from each of the three 
strata, and the number of interviews obtained 
from each stratum. 

One can see from the results in the table that 
the aims of selecting about two classes per sam- 
ple school and about four girls per sample class 
were fairly well met. We, in fact, selected an 
average of 1.87 classes per sample school, and 
3.96 girls per sample class. 

In all, there were 2004 girls selected into the 
sample, and of these 2004 girls, interviews were 
obtained from 1925, giving an overall response 
rate of 96%. The chief reason for the 79 non-in- 
terviews was the unwillingness on the part of a 
few schools to cooperate and permit the selected 
girls to be interviewed. (It should be mentioned, 
however, that the vast majority of schools were 
extremely cooperative during all phases of the 
survey. ) 

On the basis of the survey, we could now ob- 
tain an estimate of the total number of girls in 
our population. (This was, of course, incidental 
to the purposes for which the study was conducted.) 
The estimate was made by multiplying the total 
number of girls selected into the sample by the 
inverse of the sampling rate. Thus we would es- 
timate that there are 2004 x 3625 = 7.26 million 
girls in our defined population. This comes close 
to the best estimates of the population, hence the 
sample appears to have obtained a high ‘‘c over- 
age’”’ rate. 


10. An Example of How the Basic Sample De- 
sign Could be Applied to Another Population 








The basic sample design described in this 
paper could be applied, in varying degrees of com- 
plexity, to other populations for use on widely dif- 
ferent types of surveys. An example of how this 
could be done is presented below. 

Suppose we wished to conduct a survey that 
would involve intervicwing a sample of twelfth 
grade students in the State of Michigan to deter- 
mine their attitudes toward going to college. It 
would be a time-consuming and expensive proced- 
ure to interview students if they were spread 
widely throughout the state. A clustered sample 
would be more economical and could be drawn, 
using the same type of procedures described in 
this paper. First, a sample of schools? would 
be selected from the Michigan Education Direc- 
tory and Buyer’s Guide.° The schools would be 
selected with p ity proportional to the esti- 
mated number of twelfth grade students. (The 
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number of twelfth grade students ina school 
could be estimated from such things as size ofa 
recent graduating class, total enrollment, total 
number of teachers in the school, and so on.) 
Within each of the selected schools, a list of all 
twelfth grade classes would be obtained, and 
from this list, classes would be selected with 
probability proportional to the estimated number 
of students in the class. Within selected classes, 
the names of the students would be obtained, and 
students selected for interviewing. The desired 
clustering could be maintained at each stage of 
selection, that is one would be free to decide up- 
on the number of classes one would like to se- 
lect, on the average, from each school, and the 
number of students one would like to select, on 





the average, from each class. By using rela- 
tively accurate estimates of size in selecting 
schools and classes, the desired clustering 
would be achieved. 

This type of procedure would not only be more 
economical as far as the field work is concerned, 
but also as far as the work involved in selecting 
the sample. It would, indeed, be a laborious 
task to obtain the names of all twelfth grade 
students in the State of Michigan, yet sucha list 
would be necessary if one were to select a sample 
of students directly. A sample selected in sev- 
eral stages permits a great deal of freedom in 
establishing convenient and efficient size clus- 
ters. It also reduces the amount of data neces- 
sary for use in the actual selection of the sample. 


APPENDIX 


11. Generalizing Findings from Sample Sur- 
vey to Entire Population 








Since each girl in our population was givenan 
equal chance of being selected into the sample, 
proportions or means computed for the sample 
are unbiased estimates of the proportions or 
means for the entire population. For example, 
if we find that 47% of all girls in the sample do 
not date, then we would estimate that 47% of all 
girls in the defined population do not date. This 
type of estimating procedure applies to sub- 
groups as well as to the whole sample. For ex- 
ample, if we find that 26% of girls 14 through 16 
years old in the sample do not date, then we 
would estimate that 26% of all girls 14 through 
16 years old in the defined population do not date. 

These estimates are unbiased estimates of 





population characteristics, but they are subject 
to sampling error. Sampling error isthaterror 
which results from interviewing onlya sample 
from a population rather than the entire popula- 
tion. There is always the possibility that by 
chance the sample wiil contain too many or too 
few girls who do not date, too many or too few 
girls who do not receive an allowance, and so on. 
Because we have selected a probability sample, 
we can compute the size of this sampling error. 
The usual statistical textbook procedures for 
computing sampling error cannot be used for this 
sample, however, because these procedures as- 
sume that the selection of each element (i.e., girl) 
was independent. This is not the case in this 
sample, because girls were selectedinclusters 
(i.e., in the same primary sampling area, the 
same school, the same class). Special proced- 



















ures that take into consideration the design of 
the sample must be used. 

The size of the sampling error (i.e., the ex- 
tent to which the sample findings may overesti- 
mate or underestimate the true population fig- 
ures) is largely dependent on the number of in- 
terviews, but there are other factors as well. 
With a sample of a given size the smallest sam- 
pling error would be obtained if the cases in the 
sample were widely scattered throughout the 
area sampled, with no two interviews taken in 
the same place. Because this kind of sample is 
prohibitive from the standpoint of time and ex- 
pense, the interviews for this survey were clus- 
tered within primary sampling areas, schools 
and classes. Clustering increases the sampling 
error, and the proper procedures for computing 
the sampling error will take this into considera- 
tion. One method for computing sampling er- 
rors for this sample design is presented here. 

The correct computation of sampling errors 
necessitates computing the variability among the 
primary sampling units, i.e., those units that 
were selected at the first stage of selection. In 
our sample, the primary sampling units for the 
twelve largest metropolitan areas in the country 
were schools, 9 while for the remainder of the 
country the primary sampling units were the pri- 
mary sampling areas (‘‘counties’’). Therefore, 
in computing the sampling error, we should 
measure the degree of variability among schools 
in the twelve largest metropolitan areas in the 
country, and the degree of variability among pri- 
mary sam pling areas for the reraainder of the 
country.10 Because of the large number of 
schools that were selected from among the 12 
largest metropolitan areas, it would be econom- 
ically more efficient to combine schools into 
groups for the purposes of computing the sam- 
pling error.11 For example, in each of these 
12 metropolitan areas, the schools that were in 
the sample can be divided randomly into two 
groups forming 12 pairs of groups.12 For these 
12 metropolitan areas, then, variability can be 
measured by comparing the two sample groups 
in each metropolitan area. Each of the 54 pri- 
mary sampling areas, which represent the re- 
mainder of the country, should be considered a 
group for the purposes of computing the sam- 
pling errors. These 54 primary sampling areas 
can be paired to form 27 pairs of groups. 

After the groups have been formed and the 
pairing determined, sampling errors can then 
be computed. Let us suppose that we wished to 
compute the sampling error for the proportion 
of girls 14 through 16 years old who do not date. 
For the entire sample, we would obtain two fig- 
ures: (1) the number of girls 14 through 16 
years old who do not date, which we will refer 
to as y; and (2) the number of girls 14 through 
16 years old, which we will refer to as x. Then, 
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the proportion of girls 14 through 16 years old 
who do not date may be expressed as y/x. Now, 
for each of the 78 groups (24 in the 12 largest 
metropolitan areas and 54 other groups), we can 
obtain figures for the same two variables. Letus 
identify the groups by two subscripts. The sub- 
script m will identify the pair of groups and the 
subscripts a and b will identify the particular 
group within a pair. We can then compute xma 
and xmb for each of the pairs of groups, and yma 
and ymb for each of the pairs, where 


Xma = number of girls 14 through 16 years 
old in the sample in group a of pair m 
Xmb = number of girls 14 through 16 years 
old in the sample in group b of pair m 
yma = number of girls 14 through 16 years 
old who do not date and who are in group 
a of pair m 
ymb = Sood of girls 14 through 16 years 
old who do not date and who are in group 
b of pair m 


The following formulal¢ is then used to compute 
the variance: 





var ®) - a oe 2 i ymb) - @)(xma-xmb)] * 
where m=39 
= (Yma+¥mb) 
zi. m=1 
x m=39 


Zs (xma+Xmb) 


The V var(y/x) is the ‘‘standard error’’ of the 
estimate 7 - That is, it is the range on either 


side of the estimate y/x within which the true 
population figure would be expected to fall with 
68 chances out of 100. If we wished to know the 
range within which the true population figure 
would be expected to fall with 95 chances out of 
100, we would use two standard errors. For ex- 
ample, suppose we estimated that 26% of all girls 
14 through 16 years old in our population do not 
date, and the standard error of this estimate was 
calculated to be 14%. We would have 95 chances 
out of 100 of being correct when we said that the 
true population figure would lie in the range of 
26% - 2(1$%) and 26% + 2(14%) or between 23% 
and 29%. 

Because the calculation of sampling errors 
for this type of sample design is somewhzet cost- 
ly and time consuming, calculations are not usu- 
ally made for each of the many statistics derived 
from the survey. Rather, sampling error calcu- 
lations are made for a selection of items, and the 
results of these calculations are combined to 
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form a generalized sampling error table. This 
generalized table gives the approximate si ze of 
the sampling errors for various estimated pro- 
portions based on varying size subgroups of the 
sample. Such a table was developed for this 
study and indicated that the actual sampling er- 
rors ranged from approximately 1.04 times sim- 
ple random sampling error for proportions 
based upon 100 cases to approximately 1.7 times 
simple random sampling error for proportions 
based upon 2000 cases. 

Although the generalized table is adequate for 
most needs, whenever a more exact estimate of 
the sampling error is needed for an item, sam- 
pling error calculations are made for that spe- 
cific item. For this study, actual sampling er- 
rors were calculated for 96 proportions with an 
electronic computer. These detailed calcula- 
tions indicated a range for the actual sam pling 
errors of from 1.0 to 3.2 times simple random 
sampling errors. 15 


12. Problems Encountered in Collecting Data 
Necessary for Sample Selection 








For the most part we had no trouble in obtain- 
ing the data we needed for selecting schools, 
classes, or girls. There were, however, some 
problems in obtaining the lists of schools in the 
form in which we would have liked them for the 
easiest selection procedures, and some compli- 
cations were encountered because of the necessi- 
ty of using different types of lists, i.e., the Di- 
rectory, lists of new secondary schools and 
lists af elementary schools. A brief description 
of the types of problems encountered, and the 
methods by which they were solved follows. 


1. Definition of secondary and elementary schools. 
Because secondary and elementary schools 
were sampled separately, it was necessary to 
define each type of school. There was an in- 
dication that the Directory used local def ini- 
tions in making up their list of secondary 
schools, and we, too, accepted this as our 
definition. When a school contained more 
than just the elementary grades or just the sec- 
ondary grades, e.g., a school that includes 
grades 1-12, we considered that the school 
was in fact two schools, an elementary school 
and a secondary school, and for all purposes 
of sampling we treatea the school as two sep- 
arate schools. The school was divided in two 
on the basis of local definitions of grades in- 
cluded in elementary and secondary schools, 
where local definitions existed. If there were 
no clear local definition, then grades 1-8 were 
considered elementary aiid grades 9-12 were 
considered secondary. 

. Schools appearing on more than one list. In 
order to insure the proper probability of se- 





lection, every school had to be listed on one 
and only one list, i.e., in the Directory, on 
the list of new secondary schools or on the list 
of elementary schools. This necessitated ed- 
iting the lists in order to remove schools from 
lists on which they did not belong. 


. Consolidation of schools. Because the Direc- 


tory was several years old when we used it, 
changes had occurred between the time the Di- 
rectory was compiled and the time our survey 
was Tine. If a school was selected, and we 
found that it had been consolidated with sever- 
al other schools, the consolidated school was 
considered to have been selected. An adjust- 
ment was made on the probability of selection, 
however, since the selection of any one of sev- 
eral schools would have brought the consolidat- 
ed school into the sample. The probability of 
selecting the consolidated school was infact 
the sum of all of the probabilities of the 
schools that made up the consolidated school, 
and it was considered to be that. 


. Schools no longer in existence. Some schools 


had closed since the Directory was compiled. 
If a selected school was no longer in existence 
(as distinguished from being consolidated with 
several others), it was merely dropped from 
the sample. 

Combining schools and classes for sufficiency. 
In order to be able to select a given number 
of girls from a particular school orclass, the 
school or class must be estimated to be as 
large as the number of selections to be made 
from it. For this reason a minimum size was 
set for schools and for classes. The minimum 
size for schools was arbitrarily set at 50 stu- 
dents, while the minimum size for classes 
was set at four girls. Any school or class 
that was estimated to be smaller than the min- 
imum size required was ‘“‘paired’’ with other 
schools or classes to form a sampling unit 
that would be at least as large as the mini- 
mum size. Thus, for the purposes of sam - 
pling, two or more schools might be combined 
and treated as one school ‘‘unit’’ from which 
we would expect to select two classes; two or 
more classes might be combined and tre ated 
as one class ‘‘unit’’ from which we would ex- 
pect to select four girls. 


. Difficulty in obtaining lists of elementary 


schools. In some areas the lists of el emen- 
tary schools were filed in each school district, 
so that there was no central source from which 
a list of all schools in the area could be ob- 
tained. It would have been time-consuming 
and expensive to gather together a list of all 
elementary schools in the entire area. Since 
in all such areas there was a list of school dis- 
tricts, together with some estimate of the 
number of students in each district, we select- 
ed schools in two stages. First, a sample of 
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1. The sample was selected for a survey of ad- 


school districts was selected and lists of 
schools within the selected districts were ob- 
tained. From these lists, schools were se- 
lected. 


. Use of homeroom classes. We mentioned 


earlier that homeroom classes were listed 
for each selected school, and classes select- 
ed from these lists. Of course, many schools 
do not have homeroom classes, as such, and 
the method of listing classes had to be adjust- 
ed to the system used in the school. The main 
objective was to get a list of classes such that 
every girl in grades 6-12 in the school would 
be associated with one and only oneclass. At 
times this necessitated listing gym classes, 
since every girl in the particular school was 
enrolled in one and only one gym class. At 
times, English classes were used, or ‘‘first 
hour’’ classes, and so on. The inter- 
viewer was given instructions explain- 
ing the conditions that had to be fulfilled, 
and was given freedom to work out with 
the principal a listing that would meet 
the requirements. A description of the 
type of classes that were listed was sent 
in with the list of classes, so that we 


olescent girls, conducted by the Survey Re- 
search Center of the University of Mich- 
igan, under a grant from the Girl Scouts of 
America. The interviewing and the analy- 
sis of the survey were conducted in 1956 
by the staff of the Survey Research Center. 
The sample was designed and selected in 
the winter of 1955 in the Sampling Section 
of that organization. The author acknowl- 
edges the kind assistance of Leslie Kish, 
Head of the Sampling Section. 


A sample of similar design was selected 

in the fall of 1952 for a study of adolescent 
boys, financed by the Boy Scouts of Amer- 
ica. This sample had the added complica- 
tion that only boys between the ages of 14 
and 17 were interviewed. It appears that 
this was the first nationwide interview sam- 
ple of school children utilizing school lists 
in this country. However, a fine article 
describes a sample of school children se- 
lected in 1952 in England and Wales, with 
a design similar to the one described in 
this paper. See G. F. Peaker, ‘‘A Sample 
Design Used by the Ministry of Education,’’ 


Journal of the nor Statistical Society, 
» Pp- ° 
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FOOTNOTES 





2. For a description of the procedures used in 
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could check to make sure the necessary con- 
ditions were met. 


. Class changes in mid-year. Some of the se- 


lected schools operated on a two-se mester 
school year, which posed some problems be- 
cause the listing of classes for many schools 
was obtained in December and January, while 
the interviewing was done in February and 
March. In such cases, we selected classes 
from the fall list, and sent the specified 
selected classes together with acomplete list 
of classes in the school to our interviewer. 
The interviewer was asked to check with the 
principal of the school to see if our list still 
was up to date. If the list was still correct, 
the interviewer proceeded with the sel ected 
classes. If the list was no longer correct, 
the interviewer obtained a correct list, which 
she sent to us, and a new selection of classes 
was made. Our time schedule was such that 
we had to begin gathering data for the sample 
selection prior to the beginning of the second 
semester. Where feasible, it would be much 
more convenient to obtain lists of classes dur- 
ing the same semester in which the interview- 
ing is scheduled. 






selecting a sample of dwelling units, see: 
George Katona and others, ‘‘Methods of 
Survey of Consumer Finances,’’ F ederal 
Reserve Bulletin, XXXVI (July 1950), pp. 
795-809. —~CS* 


. “Directory of Secondary Schools, 1951-1952,” 


Federal Security Agency, Office of Educa- 
tion. This Directory covers all public sec- 
ondary schools and all accredited private 
secondary schools. In addition, a large 
number of unaccredited private sec ondary 
schools are listed. 


. In this example, if the school had 300 girls 


enrolled, then either one or two classes 
would be selected depending on the size of 
the random number that was picked. If the 
random number picked happened to be any 
of the numbers 1 through 10, thentwo clas- 
ses would be selected. If the random num- 
ber chosen happened to be any of the num- 
bers 11 through 290, then only one class 
would be selected. 


. The use of decimal fractions in intervals can 


be handled properly and easily. 


. Leslie Kish. ‘‘Selection of the Sample,’’ in 


Festinger and Katz (Eds.), Research Meth- 





ods in the Behavioral Sciences (New York: 
Dryden Press, 1953), pp. 175-240. 
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7. Schools could be selected directly, or if 


more geographical clustering were desired, 
counties could be selected first and schools 
then selected from within the sample coun- 
ties. 


. At least 40 of the 48 states have school di- 


rectories, and more than three-fourths of 
these directories give some measure of 
size for the schools listed (e.g., number 
of teachers, number of students, number 
of rooms in the school, etc.). 


. Since the probability of selection of each of 


the 12 largest metropolitan areas was l, 
there was, in effect, no sampling at that 
stage. For these 12 areas the first stage 
at which a sample of units was selected 
was the selection of schools. 


. See Morris Hansen and others. Sample 


Survey Methods and Theory, Vol. I (New 
York: Wiley and Sons, 1953), Chapter 10, 
Section 6. 





. See Hansen and others, op. cit., Sections 13 


and 16. 


. In practice we actually form more than 12 


pairs of groups to obtain more stability in 
the estimate of the sampling error. For 
the sake of simplicity, we assumed that 





only 12 pairs of groups were formed. 


13. As was mentioned earlier, the 54 strata 


from which the 54 primary sampling areas 
were selected were formed in such a way 
that the counties contained in each stratum 
were similar on various economic, politi- 
cal, geographic, and demographic charac- 
teristics. The 54 strata couldbe ordered 
on the basis of these characteristics, and 
neighboring strata paired. For example, 
the first and second strata wouldform one 
pair, the third and fourth would form a 
second pair, etc. On this same basis the 
primary sampling areas are paired for the 
purposes of computing sampling errors. 


14. For detailed instructions on how to compute 


sampling errors using this formula, see 
Leslie Kish, Irene Hess, ‘‘On Variances of 
Ratios and of Their Differences in Multi- 
Stage Samples,’’ Journal of the American 
Statistical Association (to appear). 








15. For a detailed discussion of the problem of 


the effect of cluster sampling upon sam - 
pling errors, see Leslie Kish, ‘‘Confidence 
Intervals for Clustered Samples,” American 


ca a Review, XXII (April 1957), pp. 





JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 26, March, 1958) 


THE EFFECT OF CHANGING THE LENGTH OF 
AN EXAMINATION ON THE INDEX OF 
INTERNAL CONSISTENCY 


WILLIAM J. MOONAN* 
USN Personnel Research Field Activity 
San Diego, California 


Introduction 


THE SAMPLE estimate, r(H), of the index of 
internal consistency, p(H), is defined a&S the 
Spearman-Brown function of the sample esti- 
mate, r(I), of the intra-class correlation, (I), 
of the responses made by the subjects to the I 
items of an examination. The statistic r(I) is re- 
ferred to in (1,2, and 3) as the coefficient of in- 
ternal consistency. Thus 


(1) r(H) = wo 


It is the purpose of this paper to examine the ef- 
fect of simultaneous changes in I and r(I) on the 
value of r(H). The effect on r(H) whenthe item- 
total score correlations are modified will also 
be discussed. 

The formulas to be developed and illustrated 
in this paper will have most value for the psy- 
chometrist who is concerned with revising pre- 
liminary forms of an examination. The effects 
of eliminating items from the tryout form on the 
internal consistency reliability can be readily es- 
timated if the item-item or item-total score cor- 
relations, r({IT), are available. 


Effects Ss 
in I and r(I 


Unfortunately, the statistical assumptions 
which are required in order to derive (1) are of- 
ten difficult to fulfill exactly. Notallresponses 
to the items can be expected to be intra-correlat- 
ed to the same degree, nor can all responses be 
expected to have the same variance and have 
normal density functions. Consequently, most 
psychometric formulas and certainly the ones 
used in this paper, must be regarded as approx- 
imations when they are used with realdata. How- 





ever, statistical literature contains many mathe- 
matical and Monte Carlo arguments which indi- 
cate that considerable latitude may often be taken 
with statistical assumptions without unduly dis- 
torting estimations and tests of hypotheses. Nev- 
ertheless, the approximate character of results 
obtained from the application of these formulas 
must be constantly kept in mind. 

For a given examination consisting ofI items, 
the analysis shown in (1) for M=1 provides a point 
estimate of (I) which is an average of II-1/2 in- 
class correlation terms. If items are deleted 
from the examination, as is often the case inpre- 
liminary psychometric analysis, the length of the 
examination decreases and ordinarily the average 
intraclass correlation among the items, r(I), will 
change. This change will probably result in an 
increased r(I) since often items are discarded be- 
cause the responses of an item do not correlate 
with those of other items or with total scores. If 
items are discarded for other reasons, such as 
inappropriate difficulty levels, r(I) willalso be 
changed, but this change does not necessarily re- 
sult in an increased r(I). 

To examine the effect of changing the length 
of the examination on r(H) let Ar(H), Ar(I) and 
AI represent either positive or negative changes 
in the values of r(H), r(I) andI. The equation 


L+Al][ r(D+Ar()) Ir(I) 
(2) Ar(H)= FrEArir@-arW] ~ llira 

then represents the change expected in the value 
of r(H) when r(I) and I acquire either the positive 
or negative increments Ar(I) and AI. Equation (2) 
is not convenient for determining how much r(I) 
should be altered for given changes in r(H) and I. 
The solution for Ar(I) may be expressed as 


(3) Ar(I) = A/A+B-r(1) 
where 


#The opinions expressed are solely those of the author and are in no way official; nor are they to be 


construed as representing those of the U. S. Naval Personnel Research Field Activity or Bureau of Per- 
sonnel. 
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A=Ir(I)+ Ar(H)[ 1+I-ir(1)] 
B={ I+ AI} {[ 1-r()] - Ar()[ 14T-ir()]}. 


An interesting and useful formula can be obtained 
from (3) by setting Ar(H)=0. If this is done an 
equation for the Mor(I), required toat least main- 
tain r(H) at its original value, is given by 


(4) A,r()- Pry - r(1). 


The first term on the right of (3) and (4) is the 
value of r(I) required in order to meet the speci- 
fied conditions. 


Effects on r(H) of Simultaneous Changes 
in I and r2(IT) 








In order to use formulas (3) and (4) in prac- 
tice, one would have to examine the intraclass 
correlations among the items. Psychometrists 
do not often calculate these correlations although 
this practice has much to recommend it. The pri- 
mary difficulty is one of calculation since, with 
an examination consisting of I items, 1-1/2, 
item-item correlations need to be calculated. If 
these correlations were available, the principal 
component or factorial characteristics of the ex- 
amination can often be roughly identified and 
items not correlating well with other items will 
be known. With the increasing availability of 
electronic computing machines, these important 
calculations can be more readily made. 

Using the same assumptions that were used 
to derive (1), the estimate of the square of the 
item-total score correlation is easily derived 
as 


(5) r2(IT)={ 1+I-ir()]/1. 


Using (1), the relationship between r(H), r(I) 
and r2(IT) is found to be 


(6) r(H)=r(I)/r2(IT) . 


Since r(H) is always less than or equal to unity, 
then r2(IT) >r(I). The relationship between r(H), 
r2(IT) and I can be found by using (1) and (6). 
Thus, 


_ [r2(IT)-1 
7) rH) qT) 


Since it is common practice to judge an item in 
part by its r(IT), the effect of a change in r?(IT) 
or r(H) when I is altered will be examined. It is, 
however, recommended that the more primitive 
information, that is, the item-item correlations 
be utilized when it is at all practical to do so. 





Analogous to equation (2), we have 


(8) Ar(u = 40) redT}+ Ars(tT) -1 


_ [ra(IT)-1 
T-1r2(IT) 


and solving for A r?(IT) gives 
(9) A r?(IT)=C/C+D-r2(IT) 
where 
C=]-ir?(IT) 
D=[ I+ Al-1]{ [1-r2(IT)]-Ar(#)c} . 


For the special case where Ar(H) = 0, equa- 
tion (9) reduces to 


I-ir? (IT) 


2 = 
(10) Aor OT)= FS Ati-rean)] 





-r(IT) . 


The first term on the right of (9) and (10) repre- 
sents the values of r?(IT) which are appropriate 
for the given conditions. 


Numerical Illustration 





The data used for this illustration were part 
of that collected by Roemmich (4). One of the 
purposes of that study was to develop a prelimin- 
ary form of a checklist which could be used to 
evaluate Sonar teams. The preliminary form of 
the checklist consisted of fourteen items (I = 14) 
which were each rated by a single rater for 24 
Sonar teams (S = 24) who were operating Sonar 
gear on the same tactical problems. The check- 
list items were scored 1, 2 and 3 by the rater de- 
pending upon his judgment of the team’s perform- 
ance. The ‘‘3’’ rating corresponds to the lowest 
rating given and indicates poor performance as 
judged by the rater. Thus the possible range for 
the total scores on the checklist was 42 to 14 with 
a mid-range value of 28. 

A summary of an analysis of variance of the 
rating given on the original form of the checklist 
is given in the first part of Table I. Also includ- 
ed in the first part is the analysis of variance of 
the ratings for a ‘‘revised’’ form of the checklist. 
The revision consisted of eliminating seven of the 
fourteen items of the checklist. It would have 
been desirable to preserit data from an independ- 
ent sample of Sonar teams on this revised form 
but this information was not available. The sec- 
ond part of Table I provides a summary of basic 
statistics of the ratings associated with the orig- 
inal and revised checklists. The letters, S.E.M. 
and S.D., are abbreviations for the standard er- 








TABLE I 


ANALYSES OF VARIANCE AND SUMMARIES OF THE ORIGINAL AND RE- 
VISED CHECKLIST RATINGS 





a) Analyses of Variance 





Original Form Revised Form 


Source of Degrees of Degrees of 
Variation Freedom Variance Freedom Variance 











Items 13 1. 6696 6 . 8869 


Items X Teams . 2210 . 1664 





Mean 839. 1696 430. 7202 


Teams 1.0454 . 9563 





Total 











b) Summary Statistics 








Statistic Revised 





Teams 24 
Items 7 
Mean (Score) 11.21 
r(I) ; . 40 
r(H) .83 
ay a. Eqn. 5] .49 
S.D. (Scores) 2.59 
S.E.M. (Scores) 

Mean (Item Diff. ) 1. 60 
S.D. (Item Diff.) ‘ .19 
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ror of measurement and the standard deviation 
of the total scores. The procedures used to 
evaluate these statistics are given in (3). 

The mean total score on the original form 
was about 22 which is less than the mid-range 
value of 28. This shows that the average rated 
performance was ‘‘above’’ the average rating. 
The mean item difficulty (that is, the average 
rating of an item) was 1.58, whichis almost 
midway between a ‘‘good’’ and an ‘‘average”’ rat- 
ing. The standard deviation of the item difficul- 
ties is .26 and this shows that mostitem difficul- 
ties were below an average of 2 and above an av- 
erage of 1. Essentially the ratings were only 1’s 
and 2’s since only fifteen of the 336 ratings were 
scored 3. The effective range of the total scores 
is then about 29 to 14. The standard deviation 
of 3.83 shows that the scores vary tothe extent 
indicated by the possible range. The standard 
error of measurement, 1.76, of the total scores 
is equal to almost half of the standard deviation 
of the total scores. This circumstance is indi- 
cated by a fairly low r(I)=.21. The average 
square of the item-total score correlations, 
r*(IT), as estimated from equation (5) is . 27, 
showing that the average r(IT) is a little over .5. 
With 14 items r(H) = .79 is a moderate value. In 
effect, this means that the rating scalecan make 
some identifications between the teams on the 
basis of the total score, but the number of these 
distinctions is not very extensive. Actual dis- 
criminations could be made by tests of signifi- 
cance for a group of ranked means. 

The seven items retained for the revised 
checklist were selected from the original four- 
teen on the basis of the item-item, item-total in- 
tra and interclass correlations. Itis interesting 
to note that this process resulted in a selection 
of items which had about the same average item 
difficulty (1.60) as that possessed by the original 
fourteen (1.58). However, the above selection 
did eliminate three of the four most extreme 
items in respect to difficulty. This is indicated 
by the reduction of the standard deviation of the 
item difficulty from .26 to .19. 

Of course, the mean score was reduced by 
about a factor of one-half because of the selec- 
tion, but the r(I) increased from . 21 to .40 which 
is a change of .19. Even though the number of 
items of the checklist was reduced by one-half, 
the r(H) actually increased from .79 to .83. This 
is certainly desirable, but the entire process of 
selection should be considered carefully. In se- 
lecting items on the basis of the largest item - 
item and item-total score correlation, items as- 
sociated with correlations which have sampling 
errors which overestimate the true parametric 
correlations tend to be selected. This means we 
cannot be sure the items will hold up as expected 
in future tryouts. Most likely the new analysis 
will fall short of expectations. This can be con- 





trolled somewhat by insuring that the sampling 
errors cannot be too large by originally obtaining 
a sufficiently large sample of subjects from which 
the correlations can be estimated. Also, the 
scope of the measurements must not be critical- 
ly reduced by the item selection process other- 
wise the original and revised checklists cannot 
be identically valid. Optimally the internal anal- 
ysis of the items should be supplemented bya log- 
ical analysis and criterion analysis if a criterion 
measurement is available. 

The intra and interclass correlations among 
the items are given in Table II. The intraclass 
correlations appear above the main diagonal of 
the matrix. In that table are also shown the cor- 
relations of the items with the total scores. In 
the row called ‘‘T’’, the values are interclass 
correlations, but in the ‘‘T’’ column the values 
were obtained by assuming that every item had the 
same variance and this variance was estimated 
by the separate item variances. In the last col- 
umn of Table II, the item difficulty (I. D.) values 
for each item and the general average item -diffi- 
culty are given. At least part or all of this infor- 
mation is available to the psychometrist for pre- 
liminary analysis of the responses to the items 
of an examination. 

The formulas developed in the second section 
of this paper can be used in various ways with 
the kind of information contained in Table I. For 
instance, if certain items are deleted, the use of 
equations (3) and (9) can be used to determine the 
increase in the average r(I) and r?(IT) required 
to increase r(H) by a given amount. Equations (2) 
and (8) can be used to predict the change in r(H) 
when I and the average r(I) or I and the average 
r?(IT) are altered. These uses will be illustrat- 
ed for the data of Table II. The computations 
will be carried out with averages of three types of 
r(I) and r2(IT) correlations. The first two aver- 
ages were obtained by averaging both the intra- 
class and interclass correlations of Table III. 
The third type of average was obtained by first 
transforming the interclass correlations by Fish- 
er’s z transformation, averaging the z’s and then 
transferring the average z back to the correlation 
scale. All of these methods of averaging pro- 
duce values for this problem that are not too dis- 
tinct arithmetically, and certainly not practically 
different. 

The evaluations of equations (1), (3), (5), (7), 
and (9) for the three kinds of average correlation 
and the special conditions where I=14, the check- 
list is reduced by 7 items, and it is desired that 
r(H) be increased by .04. The checklistscores 
were re-analyzed by an analysis of variance of 
the responses to each of the 7 items. The results 
of this analysis appears in Tabie I and the ‘‘Ta - 
ble I’’ column of Table III. The statistics r(H) 
and r(I), as derived from the analysis of vari- 
ance used in Table I, assume that the item vari- 
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TABLE If 


EVALUATION OF CERTAIN EQUATIONS FROM THE TEXT FOR SPECIAL CONDITIONS 
AND INFORMATION OBTAINED FROM TABLES I AND II 





a) Conditions: I=14, Al=-7, Ar(H)=.04 





Table II Correlations 





Statistic Table I Intraclass Interclass z Transformation 





r(I) . 210 .217 . 227 . 220 


r?(IT) . 266 . 266 . 281 . 272 





b) Evaluation of Equations with Data from Part a) 





Table II Correlations 


z Trans- 
Equation Statistic Table I Intraclass Interclass formation 








(1) r(H) .79 . 80 . 80 
(6) r(H) 79 . 82 81 
(7) r(H) 19 79 719 
(5) r2(IT) 27 27 é' 28 
(3) Ar(D) .19 .20 ; 21 
(9) Ar'(IT) .22 .23 .23 
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ances used to calculate the item-item interclass 
correlations are equal and the value is estimat- 
ed by pooling the separate item variances. 

The three methods of estimating r(H) by equa- 
tions (1), (6), and (7) provide essentially the 
same numerical estimate even with the varied 
types of correlations used to evaluate the equa- 
tion. The general value is about .80 and this 
agrees very well with the result actually found 
in Table I, .79. Ordinarily, the new index of in- 
ternal consistency would be computed by substi- 
tuting I = 7 and r(I) = . 210 in equation (1) and ig- 
noring any change, therefore, in r(I). This pro- 
cedure produces a value r'(H) = .65. This val- 
ue is quite in error in respect to .79, and the 
example provides evidence that improper use of 
(1) results in grossly inaccurate information. 
The estimate of r2(IT)=.27 or . 28 obtainedfrom 
equation (5) is likewise practically homogeneous 
for the four different evaluations. Equation (3), 
when evaluated for the conditions citedabove, in- 
dicates that r(I) needs to be increased by about 
.20. This is substantiated by Table I wherein the 
increase in r(I) is given by the difference .40 - 
.21=.19. Similarly, for the same conditions, 
r2(IT) needs to be increased by about . 23 and 
Table I indicates the change as .49 - .27 = .22. 
The results show, at least for this example, that 
the index of internal consistency of a revised ex- 
amination can be estimated from the item-item 
or item-total correlations and equations given in 
this paper without recourse to re-scoring the ex- 
amination and the analysis of variance (or other 
type of internal consistency analysis) of Table I. 
Thus the psychometrist is able to ‘‘experiment’’ 
with various revisions and have a reasonable es- 
timate of the index of internal consistency with- 
out undue labor. Of course, the value of the es- 
timate is conditioned by the fact that the new r(H) 
may be an overestimate for a reason discussed 
previously. 

The mean score of the revised checklist is 
the sum of the item difficulties of the retained 
items. In this case the revised mean score is 
11.21. The standard deviation and the standard 
error of measurement of the revised checklist 
can be obtained if the variance of a response is 
estimated. The estimated variance of are- 
sponse, S*, can be obtained from the analysis 
of variance of responses of the original checklist. 
The estimated expected value of the ‘‘I x S’’ and 
‘“‘Teams”’ sources of variability are S*[{ 1-r(I)] 
and S*[ 1+I-ir(1)]. For r(I) = .21, then S*=.2210 
/.79=. 280 or 1.0454/3.73=.280. Using r(I)=.40 
for the revised examination, the standard devi- 
ation of the total scores is V7(. 280)| 1+6(. 40)] = 
2.58 and the standard error of measurement is 
V7. 280)(1-.40)=1.08. These values agree very 





well with those actually obtained by analyzing the 
revised examination in Table I. The standard de- 
viation of the item difficulties is easily obtained 
by the usual statistical formula for the standard 
deviation. The average item difficulty is simply 
11. 21/7 = 1.60. Therefore, all the information 
available about the revised checklist and obtain- 
able from re-analysis of the revised checklist 
can be obtained by using certain formulas present- 
ed in this paper and the labor involved in rescor- 
ing and re-analyzing the examination is by-passed. 


Summary 


Equations have been derived which show the ef- 
fects on the value of the index of internal c onsis- 
tency, r(H), when the number of items, I, and 
average item-item correlation coefficient called 
the coefficient of internal consistency, r(I), are 
changed. The changes in r(H) have been exam- 
ined for changes in I and the square of the aver- 
age item-total correlation. The relationship be- 
tween r(H), r2(IT), and r(I) was given as well as 
the relationship between r(H), I, and r?(IT). Most 
of the equations were illustrated by the use of da- 
ta taken from (4). The results were in very close 
agreement to those actually obtained using a 
standard procedure. Several equations were eval- 
uated by using intraclass, interclass and z trans- 
formation correlations. The numerical results 
were practically identical for all three systems. 
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Introduction 


NUMEROUS essays have appeared in recent 
years which suggest various ways in which sc i- 
ence instruction may be improved. Especially 
prominent in many of these essays is the empha- 
sis on a need for the modification of science in- 
struction in such a way as to enhance student 
gain in ability to solve problems in science and 
to promote a ‘‘scientific attitude’. If one pre- 
fers to modify his instruction on the basis of 
sound experimental evidence rather than upon 
opinion, relatively little published material is 
available. It was the writer’s hope that the pres- 
ent study would supply some sound evidence as 
to which of two specific methods better provides 
for student gain in problem solving and scientif- 
ic attitude. 


Experimental Design 





The completely randomized design was em- 
ployed, wherein the treatments are randomly ar- 
ranged over all of the experimental material. 
There were two different treatments inthe study 
and these were specified as teaching methods. 


The Teaching Methods 





It appeared desirable to use for the compari- 
son some popular method of instruction in gen- 
eral botany and some less common but psycho- 
logically sound and practicable method. The pop- 
ular method was designated as the ‘‘convention- 
al’’ approach, and it was the method of instruc- 
tion used at the University of Minnesota for more 
than five years. The general botany course of- 
fered at Minnesota consists of lectures, demon- 





strations and laboratory work. The students re- 
ceive two one-hour lectures and two two-hour 
laboratory periods each week. Data presented 
by Miller+** suggest that the method of ins truc- 
tion in use at Minnesota is common for other 
public colleges and universities. 

The project centered method was similar to 
the conventional method in that it also made ex- 
tensive use of lectures, demonstrations, and lab- 
oratory work. However, material was present- 
ed more rapidly and a six week period was devot- 
ed exclusively to individual student project work. 
In order to accelerate the presentation of botani- 
cal facts and principles, lecture notes were dit- 
toed and distributed to the students before each 
lecture. These notes were not read, but they con- 
tained most of the definitions, drawings, andsim- 
ilar information which students would normally 
write into their notes. This freed the instructor 
to the extent that definitions need not be slowly 
recited, and more illustrative material could be 
presented as well as more frequent reference to 
the definitions. The students were freed from 
the mechanical task of writing notes and were 
asked only to listen to the lectures. Laboratory 
work was accelerated through the use of a re- 
vised schedule, closer integration of lecture and 
laboratory work, and the use of labeled photomi- 
crographs in place of line drawings which stu- 
dents labeled. A more complete description of 
the methods is available elsewhere. 2 


The Population Involved 





The 522 students involved in this study were 
those registered in General Botany 1 and/or 2 
during the fall and winter quarters of 1956-57 
at the University of Minnesota. Each student 


% The writer would like to express his appreciation to Dr. Palmer 0. Johnson and Dr. John W. Hall for 
the encouragement and assistance in carrying out the the investigation. He also acknowledges the fi- 
nancial assistance of the Lydia and Alexander Anderson Summer Fellowship and Tozer Foundation Scholar- 


ship for 1956-57. 
##A11 footnotes will be found at end of article. 
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registered for one of two lecture sections and 
one of six laboratory sections. Each laboratory 
section was in turn divided at random into two 
or three subsections. Two of these subsections 
were taught by the writer. One of these subsec- 
tions, selected at random, was taught with the 
conventional method, and the other subsec tien 
was taught by the project centered method. All 
students attended one of the two conventional lec- 
ture sections and one of the seventeen labora- 
tory sections except those students in the project 
centered laboratory section. The latter students 
attended a special lecture section taught by the 
writer. Though this section was small, an effort 
was made to conduct the section ina manner 
which would be equally effective for groups of 
200 or more. The two conventional lecture sec- 
tions were taught by two experienced professors 
of the Botany Department. The laboratory sec- 
tions were taught by graduate teaching assistants 
and other staff members with Ph.D. degrees. 


Measuring Instruments Used in the Study 





The measurement of knowledge of facts and 
principles is relatively simple and the tests em- 
ployed for this purpose were not appreciably dif- 
ferent from those used in the conventional course 
in past years. The tests consisted of mult iple- 
choice, matching, cluster-type true-false items, 
and variations of these test morphologies. Lab- 
oratory practical examinations were given, and 
these typically required the student to identify a 
plant part or plant process and occasionally to 
relate other information to the observed mate r- 
ial in order to give the correct answer. 

In order to facilitate the construction of a 
problem-solving test anda test to measure 
change in scientific attitude, it appeared desir- 
able to employ some of the concepts of informa- 
tion theory and thus to formulate a description 
of problem solving and scientific attitude. Under 
the theory established, problem solving in sci- 
ence was described as the employment of feed- 
back to gain information and to modify the suc- 
cess probability of given courses of action as 
new information was received. It was suggested 
that emotional responses constituted a source of 
internal information which in turn might affect 
subsequent behavior. The importance of emo- 
tion in the quality referred to as an attitude is 
generally recognized3 and for the purpose of this 
study, the scientific attitude was defined as the 
emotional response of recognized scientists to 
statements or activities which appeared related 
to the process of problem solving. The problem- 
solving test will be described elsewhe re4 in 
greater detail. 

Validity of the problem solving test and the 
scientific attitude test was established by the use 
of judgments from staff members in the Botany 


Department and College of Education at Minne- 

sota. Furthermore, the ‘‘correct’’ responses to 
these two tests were taken as the responses given 
by the validating group. There was good general 
agreement among them as to what the responses 

should be. 

Reliability for the various tests used was es- 
tablished through the methods of Hoyt,> Jackson,6 
and the split-half technique. The reliability coef- 
ficients were r=0. 30-0. 50, for the problem-solv- 
ing test, r=0.53 for the scientific attitude test, 
and r=0. 41 to 0.85 for the various facts and prin- 
ciples tests. 

For purposes of covariance adjustment and for 
checking on subsamples from the population of 
students, three pre-tests were given. The prob- 
lem-solving test, a specially constructed facts 
pre-test, and the American Council on Education 
Psychological Examination were used. The sci- 
entific attitudes test was administered during 
the second week of the second quarter. Corre- 
lation coefficients between the pre-tests and the 
criterion tests were calculated to aid in test se- 
lection for covariance adjustment. Most of the 
correlation coefficients were found to be less than 
0.25, and the highest correlation between pre- 
test and a criterion test was between the problem- 
solving pre-test and the lecture facts and princi- 
ples test. Table I gives the correlation co- 
efficients among the tests used. The low 
correlation between the problem-solving 
pre-test and the ACE test (r = 0.13) suggests 
that these: twotests may measure differ- 
ent abilities. 


Analysis of the Results 








Though the students in the conventional and 
project centered sections were selected at ran- 
dom, it was found that the group receiving the con- 
ventional method of instruction exceeded the pro- 
ject centered group in mean ACE rank. The dif- 
ference in means for the two groups was found to 
be significant at the 2 percent level with a t val- 
ue of 2.58 and 52 degrees of freedom. This ap- 
parent difference in general ability in favor of 
the group taught by the conventional method was 
adjusted for by the use of the analysis of covari- 
ance in future comparisons of the groups on the 
criterion tests. However, it should be noted that 
low correlation between the ACE test and the cri- 
terion tests, reported above, suggested that com- 
plete adjustment for initial difference in ability 
between the conventional and the project centered 
groups would not occur with the use of analysis of 
covariance, if such adjustment were appropriate. 
When the conventional and the project centered 
groups were compared on the problem-solving 
pre-test and the botanical facts pre-test, nosig- 
nificant differences in means or in vari- 
ance were found. 
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Comparison of the Methods on Tests for 
Knowledge of Facts and Principles 








The conventional and project centered groups 
were compared on the fall quarter lecture facts 
and principles test. An F value of 1.63 was ob- 
tained with n, = nz = 21. It was concluded that 
the conventional and the project centered group 
did not differ significantly in variance and we 
proceeded with a comparison of the means. At 
value of 1.12 was obtained and this value was 
found to correspond to a value of P greater than 
0.05 when referred to the table of t with 41 de- 
grees of freedom. It was concluded that the two 
groups did not differ significantly in mean score 
on the fall lecture facts test. The data are sum- 
marized in Table II. An analysis of covariance? 
was performed on the lecture facts scores with 
ACE and problem-solving pre-test scores held 
constant. Partial regression coefficients within 
sections were tested for homogeneity with 
Welch’s equation and Nayer’s tables. AnL, val- 
ue of 0.986 was obtained and when referred to 
Nayer’s tables with fg = 18, it was found to be 
greater than the table value at the 5 percent lev- 
el. It was concluded that the partial regression 
coefficients within sections were homogeneous. 
Proceeding with the analysis of covariance, the 
adjusted sums of squares were obtained. The 
mean square for within methods was found to be 
larger than the mean square for between meth- 
ods. The null hypothesis was not rejected and 
it was concluded that there was no significant 
difference between the means of the convention- 
al and the project centered groups when ACE 
and problem-solving pre-test scores were held 
constant. The data are summarized in Table III. 

A comparison of the project centered and the 
conventional group on the winter quarter lecture 
facts and principles test resulted in a F value of 
2.46 which gave a value of P less than0.05 when 
referred to the table of F with 16 and 27 degrees 
of freedom. The null hypothesis of equality of 
variance between the two groups was rejected, 
and it was concluded that the project centered 
group was significantly more variable on the test 
for knowledge of botanical facts and principles. 
A comparison of the means through the use of 
the Behren’s- Fisher test 8 gave a do value of 
1.32. With @ = 57° and n, = 16, ng = 27, the ob- 
served value of dg was less than the table value 
for the 5 percent level of significance. The null 
hypothesis was not rejected and it was c onclud- 
ed that the project centered and the conventional 
group did not differ in mean achievement on the 
winter lecture facts and principles test. The 
data are summarized in Table IV. 

When the winter laboratory practical examin- 
ation was used as a criterion, a significant dif- 
ference in variance was found, again with higher 
variability in the project centered group. TheF 





value obtained was 2.20 and with n, = 16 and nz 
= 27, the value of P was less than0.05. The tend- 
ency for the project centered group to be more 
variable than the conventional group suggested 
that individual differences are better provided 
for with the project centered method of ins truc- 
tion. The means for the project centered and the 
conventional group were compared on the labora- 
tory practical examination and a do value of 0.28 
was Obtained. With 8 = 59° and n, = 16, nz = 17, 
the observed value of d was not significant. The 
data are summarized in Table V. 

An analysis of covariance was performed on 
the winter quarter laboratory practical examina- 
tion scores with ACE and problem-solving pre- 
test scores held constant. Partial regression co- 
efficients within sections were tested for homo- 
geneity and a L, value of 0.987 was obtained with 
K = 2 andfg=16. The value of L, was not sig- 
nificant at the 5 percent level and the partial re- 
gression coefficients were considered to be homo- 
geneous. Adjusted sums of squares were ob- 
tained and an F value of less than 1.0 was found. 
The null hypothesis was not rejected and it was 
concluded that the project centered and the con- 
ventional groups did not differ significantly in 
mean score on the laboratory practical test when 
ACE and problem solving pre-test scores were 
held constant. The data are summarizedin Ta- 
ble VI. 


Fact Retention Test Analysis 





In order to determine whether or not fact re- 
tention might be related to method of instruction 
and student’s ability level, as measured by the 
ACE test, a two-way analysis of variance was 
performed. First a one-way analysis of var- 
iance was used to test for differences between 
the project centered and the conventional groups 
on the fact retention post-test. The basic data 
are given in Table VII and the sums of squares 
were calculated as follows: 


i subclasses = 1,...,6 
k student = 1,...,5 


Among subclasses: xxi Xj 


757+ 807+ 742+ 907+ 632+ 73? 
; _ 





2 
Within subclasses: ~ z x DX. 


456° 
~—— = 1294, 
4015 “ 124.0 


With the null hypothesis Hp: pm, = M2 = pj = 
pt, we obtained an Fo value of 3.05. Referring 
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TABLE IV 


COMPARISON OF PROJECT CENTERED AND CONTROL METHODS ON 
WINTER LECTURE EXAMINATION 





Method N Mean Variance Nuli Hypothesis 
Project Centered 17 29.06 32. 809 





Control 28 31.11 13. 358 


Fo=2. 46* reject 


do=1. 32 accept 





*Significant at the 5 percent level 


TABLE V 


COMPARISON OF PROJECT CENTERED AND CONTROL METHODS ON 
LABORATORY PRACTICAL EXAMINATION 





Method Variance Null Hypothesis 





Project Centered 66. 441 
Control 30.025 
2.20* reject 


. 28 accept 





*Significant at the 5 percent level 
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this value to Snedecor’s tables with n, = 5 and 
Nz = 24 degrees of freedom, a value of P less 
than 0.05 was observed and the hypothesis Ho 

was rejected. It should be noted that we have 
assumed homogeneity of variance which could 
be tested with Bartlett’s test. The data are 

summarized in Table VIII. 

Since we had rejected Hp above, we observe 
that two main factors may have given the signif- 
icant differences among the subgroups. These 
were one factor related to method of instruction 
and one to ability level. In addition, there exist- 
ed the possible effect of interaction between the 
two main factors. Our new model would be: Xi jk 
=A+Bji+ Cj + Ij + Zijk where A is some gener- 
al mean, Bj denotes a treatment factor, Cj de- 
notes an ability factor, 1jj denotes a factor due 
to interaction of ability and treatment, and Zijx 
denotes the effect due to error or uncontroll 
factors. With this model we tested the null hy- 
potheses: 


The means for each of the subclasses in 
Table VII were taken as single observations and 
sums of squares were calculated for the various 
factors. Sums of squares were calculated as fol- 
lows: 


i = method 
j = ACE group 


Between methods: 


2 
7 ™i. *- 42.624 48.67 8317.44 _ 


| ie 3 6 6.00 





Among ACE groups: 
= X.; . 
j _ XX. 31. 2+ $2. 87+ 27.2? 
i ij 3 


8317. 44 
= 








Total: 


2 
Ez Xij- 2 = 1407. 96 - 1386. 24= 21.72 
i 


The sums of squares were multiplied by 5, 
the constant number of individual scores in each 
subclass, to give comparable values for later 
analysis. The data are summarized in Table IX. 
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The situation generalizability was limited in 
that only the writer taught the project centered 
method, in only one school, under only one set 
of conditions. The effect of this is that the appro- 
priate analysis of variance model would be de- 
termined at the time the experiment was de- 
signed. 

There are two main models underlying the an- 
alysis of variance, i.e., the fixed effects model 
and the random model. There are additional 
models which are combinations of both fixed and 
random models. The important practical consid- 
eration is illustrated in the practice leading to the 
comparison of the primary effects with either the 
residual mean square or the interaction mean 
square in the analysis of variance table. The ex- 
perimental design used in this study may be said 
to be best described by model I or the fixed effect 
model. The mathematical expectations of the 
mean squares (leading to component analysis) are 
given in the last column of Table X. In the fixed 
effect model, the residual component is used as 
the denominator of the F ratio used to evaluate 
the significance of both of the main effects and 
the interaction component. 

The limits of generalizability were setas the 
situation in which the investigator taught both the 
project centered and the conventional sections 
which were chosen at random from the available 
sections and the students in these sections were 
also chosen at random. With this interpretation 
it was concluded that it would be highly probable 
(a five percent level of significance for errors of 
the first kind was used) that the findings would 
have been the same if the investigator had car- 
ried out the experiment with any two of the other 
laboratory sections available for the experiment. 

It was found that the ‘“‘between methods’”’ vari- 
ance ratio, i.e., the mean square for between 
methods divided by the mean square for within 
subclasses, was significant at the 5 percent level 
and the null hypothesis Hp, was rejected(see Ta- 
ble X). We note from Table VII that the mean 
for students of the project centered group was 
14. 60 as compared with one of 16. 60 for the stu- 
dents of the conventional group. 


Table X shows a significant variance ratio of 
4.03 for the among ACE groups comparison and 
the null hypothesis Ho. was rejected. We con- 
cluded that there were significant differences 
among the means on the fact retention post-test 
for the ACE ability groups. 

The interaction between methods and ability 
groups was significant, F = 3.58 and the null hy- 
pothesis Hp, was rejected. This indicated that 
the two components, methods and ability groups, 
did not account for the total variation among 
means of the subclasses. An explanation of the 
interaction component is afforded by the data in 


Table VII. Here we observed that the project 
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TABLE Vill 


ONE-WAY ANALYSIS OF VARIANCE FOR SUBCLASSES OF TABLE VII 





Source of 
Variation Sf .S. M.S. Hypothesis 





Among Sub- 
classes 15. 7600 reject 


Within Sub- 
classes 5. 1667 





* .01< P<.05 


TABLE IX 


TWO-WAY ANALYSIS OF VARIANCE OF MEANS OF SUBCLASSES FOR TABLE VII 





Source of 
Variation 





Between Methods 
Among Groups 


Residual 





Total 
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centered group had the highest fact retention 
post-test mean (15.2) for the high ability group 
whereas the highest mean (18.00) for the con- 
vertional group was for the middle ability group. 

The means given in Table VII suggested that 
the interaction between method and ability group 
is such that the conventional method enhanced 
performance of the middle ability group on the 
fact retention post-test, whereas the project 
centered method resulted in mean achieve - 
ment proportional to the ability group. These 
results are in general agreement with other an- 
alyses presented above which suggested that the 
project centered method of instruction enhanced 
group variability and the conventional method of 
instruction resulted in relatively lower group var- 
iability. 


Performance on the Problem -Solving Tests 





The project centered and conventional groups 
were compared on mean scores for the problem- 
solving post-test. No significant difference in 
variance was found, and the observed to value 
of 1.59 was not significant. It was concluded 
that the project centered and the conventional 
groups did not differ significantly in mean score 
on the problem-solving post-test. The data are 
summarized in Table XI. 

The sums of squares for the project centered 
and the conventional groups were adjusted for in- 
equalities of groups on the problem -solving pre- 
test. The partial regression coefficients within 
sections were tested for homogeneity and an L, 
value of 0.991 was obtained, and with K = 2 and 
fg = 22 the value for P was greater than 0.05and 
the sections were considered homogeneous. The 
analysis of variance was completed and the ad- 
justed mean square for ‘‘within methods’’ was 
found to be greater than the adjusted mean 
square ‘‘between methods’’. It was concluded 
that the project centered and the conventional 
groups did not differ significantly in mean scores 
on the problem-solving post-test when problem- 
solving pre-test scores were held constant. The 
data are summarized in Table XII. 


Comparison of Methods on the Scientific 
Attitudes Test 








The test for scientific attitude contained 
thirty-five items. The raw score means for the 
project centered and the conventional groups on 
the scientific attitude pre-test were 28.88 and 
28.68, respectively. Little absolute change in 
mean score on the scientific attitude test was 
found and the change was not significant’ for 
either the project centered or the conventional 
groups. The data are summarized in Table XIII. 
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It was concluded that neither groups made a sig- 
nificant mean gain in scientific attitude over the 
two month interval, as measured by the tests 
used. 


Summary and Recommendations 








An experimental comparison of student achieve- 
ment under two different methods of instruction 
in general botany was presented. One method 
was the teaching approach used at a large state 
university (conventional method) and the other 
method was similar except that material was pre- 
sented more rapidly and a six week period was de- 
voted exclusively to project work (p roject cen- 
tered method). The two methods were compared 
as to student change in (1) knowledge of botanical 
facts and principles, (2) ability to solve problems 
in science, (3) gain in scientific attitude, and (4) 
retention of factual knowledge. 

The experimental design was described, and 
appropriate analysis of variance and covariance 
of the data was presented. No significant differ- 
ences in means were found under the two methods 
of instruction, with the exception that a two-way 
analysis of variance of the fact retention post-test 
scores and ACE percentile ranks indicated a sig- 
nificant difference in means on the fact retention 
test in favor of the group taught by the conven- 
tional method. The same analysis showed a sig- 
nificant interaction between ability group and 
method of instruction. It was also shown thatthe 
group taught by the project centered method was 
significantly more variable in achievement on 
tests for botanical knowledge. The latter data 
together with data from the two-way analysis of 
variance suggested that the project centered 
method provided better for individual differences 
with achievement under this method proportional 
to level of ability. 

Knowledge acquired by the students during the 
six week period of project work was not meas- 
ured in this study. In view of the fact that the 
project centered method was found to be at least 
as effective as the conventional method in teach- 
ing botanical facts and principles, though the rate 
of presentation was more rapid, the writer rec- 
ommends that others try a similar approach to in- 
struction in college general botany. It remains 
to be shown that the period of project works con- 
tributes to student gains in the four areas men- 
tioned, but there was a suggestion of superior 
achievement on the problem-solving test and sci- 
entific attitudes test for the project centered 
group, though this difference was not significant. 
Future experimentation with similar tests of 
higher reliability may actually show significant 
difference in favor of the project centered meth- 
od of instruction. 
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THE DIMENSIONS OF CANCER KNOWLEDGE 
OF DENTAL STUDENTS 


PETER G. LORET - RICHARD B. WEST 
Cancer Research Institute 
University of California School of Medicine 
WILLIAM B. MICHAEL** 
University of Southern California 


IN A RECENT paper, Towner (4) has report- 
ed upon the manner in which medical students at 
different stages in their formal training organize 
their knowledge of cancer on twelve parts of an 
objective achievement examination intoa number 
of clearly discernible factors. In each of four 
analyses carried out with representative sam - 
ples of students who were freshmen, sopho- 
mores, juniors, and seniors in a majority of the 
medical schools of the United States, Towner 
found that between three and five int erpretable 
factors appeared, the nature of each of which 
was thought to change to some extent with the 
amount of learning. Because of the importance 
of cancer knowledge in dental training, a simi- 
lar test was administered during the sp ring se- 
mester of 1955 to more than ten thousand dental 
students at thirty-seven dental schools thr ough- 
out the United States. 


Problem 


It is the purpose of this study (1) to ascertain 
the nature of the organization of the knowledge of 
cancer possessed by dental students as reflected 
by their responses to objective items in an 
achievement test, (2) to determine the manner 
in which the pattern of organization of knowledge 
may change relative to the amount of formal 
training received in the dental school program, 
and (3) to gain evidence relative to the amount 
of correlation among statistically isolated di- 
mensions as to whether the areas of kno wledge 
about cancer tend to become more highly inte- 
grated or differentiated as students progress in 
their program. It appeared that the most nearly 
appropriate statistical model that could be em- 





ployed in the investigation of the problem was 
that of multiple-factor analysis as formulated by 
Thurstone (3). 

In the application of the model, the tetrachor- 
ic coefficients of correlation among the eleven 
parts of the test were factored in an attempt to 
find the smallest number of psychologically 
meaningful dimensions that would describe the 
observed intercorrelations among the eleven 
parts of the examination. 


The Examination 





A description of the 1955 Cancer Knowledge 
Examination, which was composed of 90 multi- 
ple-choice items, is given in Table I. For 2621 
freshmen, 2525 sophomores, 2333 juniors, and 
2275 seniors, test papers were scored that yield- 
ea means and standard deviations, respectively, 
of 27.7 and 6.7; 45.8 and 12.6; 56.9 and 11.0; 
and 65.5 and 8.3; and Kuder-Richardson reliabil- 
ity estimates of .61, .88, .86, and .79, respec- 
tively. 


Selection and Samples 





From each of the four groups cited a rep- 
resentative sample of 400 papers was se- 
lected through the use of Neyman’s meth- 
od as described in Snedecor (2:459-61). In 
essence, this method permits the selection 
of a sample with characteristics in propor- 
tion to those in the population from which 
it is chosen. Appropriate statistical an- 
alyses indicated that the samples were 
truly representative of the populations from 
which each sample had been drawn. 


* This study was carried out in the Education Project which is supported by grants from the National 
Cancer Institute, National Institutes of Health, United States Public Health Service. 


**Appreciation is expressed to Edith Jay for her assistance in the rotational aspects of the factor so- 


lution. 
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TABLE I 


TOPICS COVERED IN 1955 CANCER KNOWLEDGE EXAMINATION AND 
THE NUMBER OF ITEMS PERTAINING TO EACH TOPIC 





Intended Dimension 
of Knowledge Content of Topic 





History and physical diagnosis of tumors 
(Diagnosis based on patient’s history 
and physical exam) 


X-ray diagnosis of tumors 
Diagnosis (Diagnosis by means of x-ray and 
fluoroscopy) 


Laboratory and special diagnosis of tumors 
(Diagnosis by laboratory and special pro- 
cedures) 





Etiology of tumors 
(Causation of disease; sum of knowledge 
regarding causes) 


Incidence of tumors 
(Range of occurrence) 


Biology of tumors 
(Biological characteristcs) 


Biological Histology and pathology of tumors 
Characteristics (Structure, composition, and function of 
tissue; and structural and functional 


changes) 


Metastasis of tumors (Spread) 
(Transfer of disease from one organ or 
part to another not directly connected to it) 


Prognosis of tumors 
(Forecast of probable result of an attack 
of disease; prospects of recovery) 





Surgical treatment of tumors 
Treatment 


X-ray and isotope treatment of tumors 
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Factor Analysis 





The matrix of coefficients of tetrachoric in- 
tercorrelations among the eleven sub-tests, or 
topics, were subjected to a centroid analysis as 
outlined by Fruchter (1). In the case of the cor- 
relational matrices corresponding to freshmen 
and seniors two extractions were made, since 
at the end of the first extraction the difference 
between the initial estimate of communality 
(highest correlation coefficient in the column) 
and the sums of squares of loadings in the cen- 
troid factors extracted differed by more than .10 
on one or more tests. Although a precise cri- 
terion apparently does not exist for the formula- 
tion of a decision as to the point at which cessa- 
tion of factor extraction in centroid analysis 
should take place, it was decided that any c en- 
troid factor with a loading equal to or exceeding 
.25 should not be eliminated. Five factors were 
retained in the instance of the matrices for soph- 
omores and juniors, and six factors for the other 
two matrices. 

Blind rotations of pairs of axes were carried 
out through use of the single plane method, as de- 
scribed by Thurstone (3). The criterion of sim- 
ple structure was used as the guiding principle 
in the rotation process, with some consideration 
given to the criterion of positive manifold in the 
final adjustments effected toward the close of the 
rotation process. For each of the four centroid 
matrices, five hyperplanes could be fairly clear- 
ly defined. 

The four final rotated oblique factor matrices 
(V-matrices) that correspond to each ofthe four 
dental classes are presented in Table II. The en- 
tries in each of these V-matrices constitute the 
perpendicular projections of the test vectors up- 
on the normals to the hyperplanes—normals that 
are sometimes called reference vectors. A co- 
sine matrix, known as the A’A matrix, was cal- 
culated relative to each V-matrix to ascertain 
the extent to which the factors (reference vec- 
tors) were correlated. None of these cosine 
matrices is reproduced. 


Interpretation of Factors 





For each of the four samples of students, five 
factors (reference vectors in the V-matrix) were 
identified. However, considerable caution should 
be exercised in the psychological interpretation 
or description of the factors, since in many in- 
stances only two or three variables were defini- 
tive of the factor. In general, the same five psy- 
chologically meaningful factors seemed to ap- 
pear for each of the samples with the noteworthy 
exception of the group of juniors. Despite the 
under-determination of the factor structure, five 
factors were tentatively, but seemingly meaning- 
fully, described for the three college groups as 





being related to the possession of (1) general in- 
formation about cancer, (2) knowledge about di- - 

Osis, (3) familiarity with treatment proc ed- 
ures, (4) ability in the area of forecasting profi- 
ciency (that is, the ability to predict the prob- 
able result of an attack of the disease and the 
likelihood of its containment or spread to other 
organs or parts of the body), and (5) knowledge 
of the biological characteristics of tumors. 

Although in the case of the juniors the same 
three factors of Sagnosi=, general information, 
and forecasting proficiency appeared, the factor 
of treatment did not emerge in a clear-cut fash- 
ion as in the other three analyses. Instead there 
were two factors that were associated with knowl- 
edge of biological characteristics, one of which 
might seem, in terms of curricular emphasis 
to represent knowledge of oral pathology. De- 
tailed information concerning the interpretation 
of factors relative to their loadings on each of 
the topics (sub-tests) is presented for freshmen, 
sophomores, juniors, and seniors in Tables III, 
IV, V, and VI, respectively. 

That the factor structure was somewhat less 
clear-cut for the junior group was thought per- 
haps to be a function of differences in the curric- 
ulum and of differences in instructional emp ha- 
sis. It seemed reasonable to believe that stu- 
dents at the freshmen and senior levels would 
tend to be somewhat more homogeneous with re- 
spect to their degree of experience and poss es- 
sion of information than sophomores and juniors 
who may have been exposed to different dental 
subjects at different times. In other words, at 
the beginning and at the end of the four-year pro- 
gram the students within the same academic 
class should be at about the same stage in their 
familiarity with dental topics, but within the four- 
year period a certain amount of discrepancy 
should be expected because of the variability in 
the extent to which different areas of the curric- 
ulum have been emphasized in individual dental 
schools. Additional evidence in support of such 
a hypothesis is apparent in terms of not only 
the presence of different tests in the idenifica- 
tion of a given factor in each of the four factor 
matrices, but also in the changes in the magni- 
tude of the loadings of a given test in the same 
factor in each of the four factor matrices. 

From a study of the four A'A (cosine) ma- 
trices, which as previously mentioned are not 
presented, it was apparent that, since the inter- 
correlations among the reference vectors (the 
perpendiculars to the hyperplanes) were extreme- 
ly low, the factors were close to being or thog- 
onal. In view of the small number of variables 
defining each factor, the magnitude of the corre- 
lation between the factors should be subject to 
scrutiny. In each of the four cosine matrices 
the two numerically largest cosines were -. 25 
and -.24, -.52 and -.48, -.40 and -.39, and -.45 
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TABLE I 
ROTATED OBLIQUE V FACTOR MATRICES OBTAINED FROM ELEVEN 


TOPICS COVERED IN THE 1955 CANCER KNOWLEDGE 
EXAMINATION 








Freshmen 


Sophomores 
Factors (lerence Vectors) 


i 


Factors (Reference Vectors) 
ll 


Ii 


IV 


Topic 


I 


0 


I 


IV 





-04 


01 


69 


44 


-09 


04 


-01 


87 
86 
33 

-03 


-03 


00 
11 
-06 
02 
02 


03 


-03 


03 
-06 
01 
-02 


-02 


01 
16 
65 


1 


2 


3 


42 
30 
28 

~12 


02 


08 


-11 


-02 
02 
-07 
02 


05 


00 
-09 


50 


44 


-07 





Juniors 


Factors (Reference Vectors) 


I 


0 


I 


IV 


Topic 


I 


Seniors 


Il 


Il 


Factors (Reference Vectors) 


IV 





03 


69 


69 


16 
-06 


34 


34 


03 


-06 
02 
07 


-02 





1 
2 
3 


70 


81 


30 


02 
02 


-05 


00 


02 


15 
01 
04 
-02 
18 


-02 
17 


40 


-07 
05 
50 


43 


03 
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TABLE Ill 


INTERPRETATION OF FACTORS OBTAINED FROM FRESHMAN DENTAL STUDENT PER- 
FORMANCE ON ELEVEN TOPICS IN THE 1955 CANCER KNOWLEDGE 
EXAMINATION 





Factors Topics Loadings Interpretation of Factors 





I 3 69 General information: familiarity with commonly known 
7 47 facts regarding pre-cancerous states or conditions and 
4 44 with factors contributing to the development of the dis- 
ease. 





87 Diagnosis: knowledge concerning the diagnosis of tumors 
86 by means of history, physical examination, x-rays, labor- 
33 atory and special methods. 


75 Treatment: familiarity with the nature and effectiveness 
66 of traditional therapeutic measures that involve some ac- 
37 ademic knowledge of the distinction between malignant, 


33 potentially malignant, and benign tumors. 
28 


69 Forecasting proficiency: knowledge of facts relative to 

65 the probability of the containment or spread of the disease 
and to the likelihood of recovery or survival of the patient 
(prognosis). 





Biological characteristics: academic knowledge of the bi- 
ological properties of tumors, their incidence, and their 
histological and pathological aspects, perhaps in relation 
to history and physical diagnosis. 
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TABLE IV 


INTERPRETATION OF FACTORS OBTAINED FROM SOPHOMORE DENTAL STUDENT PER- 
FORMANCE ON ELEVEN TOPICS IN THE 1955 CANCER KNOWLEDGE 
EXAMINATION 





Factors Topics Loadings Interpretation of Factors 





I 42 Diagnosis: knowledge concerning the diagnosis of tumors 
34 by means of history, physical examination, x-rays, lab- 
30 oratory and special methods with perhaps some added in- 
28 formation in oral pathology as reflected by factor loading 
on Topic 7. 


54 Treatment: familiarity with the nature and effectiveness 
43 of traditional therapeutic measures that involve some ac- 


38 ademic knowledge of the distinction between malignant, 
potentially malidnant, and benign tumors. 


General information: familiarity with commonly known 
factors regarding pre-cancerous states or conditions and 
with factors contributing to the development of the dis- 
ease. 





Biological characteristics: academic knowledge of the bi- 
ological properties of tumors, their incidence, and eti- 
ology. 





Forecasting proficiency: knowledge of facts relating to 
the probability of the containment or spread of the disease 
and to the likelihood of recovery or survival of the patient 
(prognosis). 
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TABLE V 


INTERPRETATION OF FACTORS OBTAINED FROM JUNIOR DENTAL STUDENT PER- 
FORMANCE ON ELEVEN TOPICS IN THE 1955 CANCER KNOWLEDGE 
EXAMINATION 





Factors Topics Loadings Interpretation of Factors 





I 59 Biological characteristics I: knowledge of oral pathology 
32 apparently gained from exposure to a curriculum in which 
31 metastasis of tumors is emphasized and in which topics of 
29 incidence, prognosis, and histology and pathology of can- 
28 cer are considered. 





69 Diagnosis: knowledge concerning the diagnosis of tumors 
69 = means of history, physical examination, x-rays, lab- 
21 oratory and special methods. 


General information: familiarity with commonly known 
factors regarding pre-cancerous states or conditions and 


with factors contributing to the development of the dis- 
ease. 





Forecasting proficiency: knowledge of facts pertaining 
to the prediction of the probable result of an attack of the 
disease, particularly with reference to application of sur- 
gical methods. 





Biological characteristics II: academic knowledge of the 
biological properties of tumors from the standpoint of eti- 
ology, incidence, histology, and pathology, and laboratory 
diagnostic procedures. 
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TABLE VI 


INTERPRETATION OF FACTORS OBTAINED FROM SENIOR DENTAL STUDENT PER- 
FORMANCE ON ELEVEN TOPICS IN THE 1955 CANCER KNOWLEGE 


EXAMINATION 





Factors 


Topics 


Loadings 


Interpretation of Factors 





I 


Diagnosis: knowledge of diagnosis of tumors by means of 
history, physical examination, x-rays, laboratory and 
special methods. 


Biological characteristics: knowledge of oral pathology 
relative to the biological characteristics of tumors, inci- 
dence of tumors, metastasis, and histology and pathology 
of tumors. 





Forecasting proficiency: knowledge of facts relating to the 
prediction of the probable result of an attack of the dis- 
ease (prognosis) and to the probability of containment or 
spread of the disease (metastasis) with some reference to 
application of surgical treatment. 





General information: familiarity with commonly known 
facts regarding pre-cancerous states or conditions and 
with factors contributing to the development of the disease. 





Treatment: knowledge of the nature and effec tiveness of 


methods of treatment. 








LORET - WEST - MICHAEL 239 


and -.42. Most of the cosine entries varied be- 
tween -.10 and .10. 

The corresponding primary factors would be 
only slightly correlated. The existence of a 
meaningful second order factor representing 
some sort of general factor would seem to be 
somewhat questionable in view of the close ap- 
proximation of the structure of isolated factors 
to orthogonality. Since no pronounced trend 
was found as to the degree of correlation among 
corresponding pairs of reference vectors, no 
conclusion could be made regarding the tendency 
for the factors to become more or less corre- 
lated interms of the length of time spentin dental 
school. There seemed to be no decided trend in 
either the integration or differentiation of areas 
of knowledge in terms of either an increase or 
decrease, respectively, in the amount of corre- 
lation among factors. 


Summary 


From a consideration of the data furnished by 
the factor analysis, the following conclusions 
may be drawn: 


1. At the freshman, sophomore, junior, andsen- 
ior levels dental students exhibited behavior 
on a test of cancer knowledge that could be ex- 
pressed in terms of a number of psycholog - 
ically meaningful factors that were relatively 
independent. 


. In each of the four analyses, five factors ap- 
peared that seemed to be interpretable. 


. Except for the sample of juniors, the five fac- 
tors were identified as being (a) knowledge 
about the biologics! characteristics of tumors, 
(b) the possession of general information con- 

cerning commonly knewn facts about pre-can- 
cerous states and factors contributing to the 
development of the disease, (c) forecast 
proficiency, which was deceribad as or ability 
to predict Ben data given the outcome of an 
attack and the probability of its containment 
or spread, (d) knowledge about the diag nosis 
of tumors from a description of a patient’s 
history, physical examination, x-rays, labor- 











atory and other findings, and (e) familiar ity 
with the nature and effectiveness of methods 
of treatment. 


. In the analysis of the juniors a second factor 


concerning biological characteristics emerged 
whereas that pertaining to treatment did not 
appear. It was hypothesized that differences 
in the sequence of topics studied and inthe 
emphasis placed upon them in the several 

participating colleges might account for the 
results obtained for the given group. 


. There was the suggestion that in view of the 


differences in the groupings of sub-tests, or 
topics, that served to identify a given factor, 
the nature of the factor may have changed 
somewhat in terms of the amount and type of 
instruction received by students in their dental 
program. 


. The relatively low correlations among factors 


appearing in each of the four analyses did not 
support either the hypothesis of the existence 
of a second order or general factor, or the hy- 
pothesis of a consistent change toward either 
greater integration or greater differentiation 
in the dimensions of cancer knowledge as a 
function of the amount of time spent in dental 
school. 
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TESTS OF HYPOTHESES ABOUT DIFFER- 
ENCES BETWEEN TWO INTERCOR- 
RELATION MATRICES 


K. WARNER SCHAIE 


University of 


PSYCHOLOGICAL test batteries are fre- 
quently administered to samples differing in one 
or more characteristics from the population on 
which the test has been standardized. When this 
is the case one has often reason to suspect the 
validity of sub-test as well as composite scores. 
Many investigators try to reassure themselves 
and their readers by presenting the intercorrela- 
tions obtained for their experimental group as 
compared with the intercorrelations for the norm 
group. From inspection the inference is then 
made that there does not seem to be any differ- 
ence between the two matrices. 

If the equivalence of two intercorrelation ma- 
trices is to be accepted as evidence for applying 
a test battery to a different population, such 
equivalence needs to be demonstrated by more 
rigorous procedures than visual inspection or 
graphic plotting. The problem is complicated, 
however, by the fact that weare in essence faced 
with tests of hypotheses about series of non-in- 
dependent statistics. From the point of view of 
the mathematician this may mean that an exact 
procedure for a multi-variate problem will be 
computationally prohibitive or even impossible 
to derive. On the other hand, some arbitrary 
(though it is felt not unreasonable) assumptions 
may make it possible to describe some practical 
procedures, which in general should permit ap- 


proximations to upper bound confidence state- 
ments. 


The Two-Variable Case 





Consider the case where the null hypothesis 
is to be tested for the difference between two 
product-moment correlation coefficients yielded 
by computing the relation between two variables 
for two independent samples. The convention- 
al procedure (2:131-33) would involve trans for- 
mation of the two coefficients into Fisher’s z', 
computing the standard error of the difference 
between z's (which will take into account the dif- 
ference in N, if any), obtaining the difference 
between the two z's and evaluating the ratio 





Nebraska 


Dz' /oz in terms of the normal probability curve. 
One can then reject or refuse to reject the null 
hypothesis at whatever confidence level one 
wishes to specify. 


The n-Variable Case 





When the operations suggested by the two-var- 
iable case are performed for every correspond- 
ing pair of correlation coefficients in two inter- 
correlation matrices one obtains a n(n-1)/2 ta- 
ble of significance ratios expressed in normal de- 
viate units. While the test of the null hypothesis 
in the two-variable case merely involved asses- 
sing the chance probability of the occurrence of 
one difference of a given magnitude, the problem 
now is that of assessing the overall significance 
of a series of statistics. 

Intuitive analysis will suggest two possible 
approaches for a test of the null hypothesis for 
the overall significance of differences between 
two such matrices. The simplest approach 
would be to assume that the average correlation 
among the elements of the matrix of differences 
is zero and then utilize nomographs such as 
those given by Sakoda, Cohen and Beall (6), toas- 
certain whether the number of significant di ffer- 
ences obtained at a given confidence level is with- 
in the limits of chance probability fora specified 
series of statistical tests. 

A serious limitation of the above method is 
due to the fact that only extreme differences will 
be considered in such an evaluation. This ap- 
proach may lead investigators to be rather prone 
to accept the null hypothesis where there isa 
true overall difference and unless the matrices 
are large it may often be impossible for certain 
values of r to reject the null-hypothesis ata con- 
ventionally acceptable level of confidence. 


Tests Involving Distribution Functions 





A more powerful test may be possible if the 
null hypothesis is restated. The condition for 
accepting the equivalence of two correlation ma- 
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trices shall now require that the cumulative fre- 
quency distribution of observed significance ra- 
tios for the differences between the correspond- 
ing values of z' does not differ significantly 
from the cumulative integral of the normal er- 
ror curve for a specified frequency. This con- 
dition makes it necessary to evaluate not only 
extreme differences, but also take into account 
every discrepancy between the two matrices, re- 
gardless of their magnitude. This hypothesis 
can best be evaluated by the Kolmogorov-Smir- 
now test for goodness of fit. 

Fairly non-mathematical descriptions of this 
test may be found in Goodman (3) and in Massey 
(5). The assumptions required for this test are 
that the population cumulative frequency function 
be completely specified and that each member of 
the sample distribution be distinct. No other 
assumptions need be made with respect to the 
characteristics of the sample to be evaluated 
except that its values must also be orderedin the 
form of a cumulative distribution. The first as- 
sumption is met by using the cumulative integral 
of the normal error curve which is completely 
specified for N’s of any given size. Asto the 
second assumption it seems reasonable to as- 
sume that the average correlation among the dif- 
ferences between corresponding pairs of z's will 
be zero. * f 

The test proceeds as follows: From a table 
of the normal probability curve the cumulative 
step function of theoretical values is written for 
a population where N = n(n-1)/2. The cumula- 
tive frequencies of observed values falling below 
each theoretical value are tabulated and the theo- 
retical frequencies are subtracted from the ob- 
served frequencies. The largest positive dis- 
crepancy is then divided by N to obtain a propor- 
tion which may be entered into tables of critical 
values of d (where d is defined as the largest per- 
missible discrepancy between a specifiedand an 
observed distribution), as given by Massey (4). 
If the largest positive discrepancy does not ex- 
ceed the tabled value one may then conclude that 
the null hypothesis is plausible and that the two 
correlation matrices could be chance deviations 
from the same population correlation matrix. 
Massey’s tables do not provide sufficient entries 
for the magnitude of N arising out of problems 
discussed in this paper. Critical values for the 
1%, 5% and 10% levels of significance are there- 
fore given in Table I for matrices containing 
from 4 to 20 variables. 





Computational Example 





In a paper by Schaie, Rosenthal and Perlman 
(7), intercorrelations for Thurstone’s SRA Pri- 
mary Mental Abilities test are given for two 
samples. One of these samples is Thurstone’s 
standardization group composed of high school 
students and the second sample represents 
Schaie’s population of unselected individuals in 
the age group 58 to 78 years. The hypothesis to 
be tested is that the two sets of intercorrelations 
could be chance deviations from the universe cor- 
relation matrix for the Primary Mental Abilities. 
Implicitly the proposition to be tested is that the 
internal structure of the Primary Mental Abili- 
ties battery remains constant when applied to a 
population differing from the standardization 
group. 

Columns 1 and 2 of Table II give the correla- 
tions for Thurstone’s and Schaie’s samples. Col- 
umns 3 and 4 give the corresponding z' values, 
and column 5 shows their difference. The stand- 
ard error of the difference between z's for the 
two samples is . 1386 and when the values in col- 
umn 5 are divided by this constant one obtains 
significance ratios expressed in terms of normal 
deviates. These latter values are given in col- 
umn6of Table Il. Table III shows the 
Kolmogorov-Smirnow test. Column 1 lists the 
cumulative step-function from 1 toN. Column2 
indicates the corresponding upper limits obtained 
from the cumulative integral of the normal error 
curve. Column 3 contains the corresponding cu- 
mulative frequencies obtained from column 6 of 
Table I. Column 4 gives the differences ob- 
tained by subtracting column 3 from column l. 
The largest positive discrepancy in column 4 
turns out to be zero. From Tablelit may be 
noted that a value of d = . 139 would be significant 
at the 1% level. An observed positive disc rep- 
ancy of 1 would therefore have still been permis- 
sible. All discrepancies in the exampleare zero 
or negative and it may be concluded thatthe null 
hypothesis is plausible and the two intercorrela- 
tion matrices could be chance deviations from 
the universe intercorrelation matrix. 


a Approximation to the Test 
of Significance 

The method illustrated in the preceding para- 
graphs becomes rather laborious when one deals 


with large correlation matrices, since extensive 
interpolations may be required to tabulate the 





#A more exact way of meeting the assumption of distinctness would be to compute the distribution of 
Dz! /oys ratios from the differences between the corresponding pairs of entries of the inverses of the 


corre. 


ation matrices. The effect of partialling out the variance of other matrix elements, however, 


would probably be random in most cases, and the considerable additional labor may therefore be unjus- 


tified. 





TABLE I 


CRITICAL VALUES FOR THE KOLMOGOROV-SMIRNOW TEST AT THE 1%, 5% AND 
10% LEVELS FOR DISTRIBUTIONS OF SIGNIFICANCE RATIOS FOR DIFFER- 
ENCES BETWEEN INTERCORRELATION MATRICES 





No. of No. of Critical Values 





Variables Statistics 5% 10% 





. 212 . 233 
. 164 . 180 
.134 . 147 
.113 .124 
. 098 . 108 
. 087 . 095 
.078 . 085 
. 070 O77 
. 064 . 070 
. 059 .065 
.055 . 060 
.051 . 056 
. 047 . 052 
. 038 .045 . 049 
. 036 . 042 . 046 
. 034 . 040 .044 
. 032 . 038 . 041 


. 44 . 52 . 57 
More than 20 Vnin-1)/2 Vni(n-1)/2 Vv nin-1)/2 
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TABLE II 


COMPUTATIONAL EXAMPLE FROM A PAPER BY SCHAIE, ROSENTHAL AND PERL- 
MAN (7); SHOWING COMPUTATION OF THE SIGNIFICANCE RATIOS 
(See text for explanation) 





z's 








oA 


Dz: /oz' 
(6) 


. 52 
. 52 
- 45 
. 32 
. 55 
. 22 
. 44 
. 00 
91 











TABLE Iil 


COMPUTATIONAL EXAMPLE SHOWING THE KOLMO- 
GOROV-SMIRNOW TEST FOR DATA 
FROM TABLE I 
(See text for explanation) 





Luis {(x)o 
(2) (3) 





~~ 
— 
— 





-13 
25 
. 39 
. 52 
. 68 
. 84 
1.04 
1. 28 
1. 64 


CSCOBDADUSLWNe 
COKMAARWNe 


— 
= 





Maximum positive d = 0 
P<.01 
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normal deviate limits for the theoretical cumu- 
lative step function from conventional tables of 
the normal error curve. With matrices con- 

taining more than 15 variables (and thus 105 or 
more statistics) a reasonable short cut may be 
suggested which should result in a fairly good 
approximation to the exact Kolmogorov test. 

Although ealier papers (4) suggested that the 
theoretical distribution must not only be com- 
pletely specified, but must also be continuousiy 
distributed, it has since been shown that the ta- 
bled critical values will be adequate for discrete 
distributions (1,3). The following procedure is 
suggested for this simplification. 

Arrange values of the theoretical cumulative 
step-function in intervals of .20g or . 15g width. 
This will give a maximum of 15 or 20 intervals 
(depending on the magnitude of N). Now tabulate 
the observed values and arrange in a cumulative 
frequency distribution with the same intervals. 
Subtract the observed from the theoretical cumu- 
lative frequency values. Divide the largest pos- 
itive discrepancy by N and enter TablelI as pre- 
viously indicated. 


Summary 


This paper presents a discussion of intuitive ap- 
proximations to a test of significance for the over- 
all discrepancies between intercorrelation ma- 
trices for the same test battery given to two in- 
dependent samples. The approach is equally ap- 
plicable to test the significance of discrepancies 
between the intercorrelation matrix for a s tand- 
ardization population and the intercorrelations 
obtained for an experimental group drawn from 
a population differing in one or more character- 
istics from the standardization group. 

The simplest approximation involves obtain- 
ing significance ratios for the difference between 
each corresponding pair of r's after transforma- 
tion to Fisher’s z'. The overall significance 
test then involves comparing the number of ob- 
tained significant differences with those attribu- 
table to chance in a given distribution of statis- 
tics. It is pointed out that for small matrices 
this test would accept the null hypothesis in most 
cases even if it were false. 

The suggested method involves use of the 
Kolmogorov-Smirnow test of goodness of fit to 





test the discrepancy of the obtained distribution 
of significance ratios from the theoretical distri- 
bution specified by the cumulative integral of 
the normal error curve. A simplified method 
of computation is suggested for large matrices, 
a computational example is presented and a ta- 
ble of critical values of maximum permissible 
discrepancies is given. 
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TEST OF SIGNIFICANCE OF DIFFERNCES 
OF CHANGES 


CYRIL J. HOYT - P. R. KRISHNAIAH 
University of Minnesota 


Abstract or Summary 





THIS PAPER shows a procedure for tes ting 
the significance of the differences between the 
means of two successive measurements of the 
same variable on samples from a number of 
strata of a given population. This test uses as 
the basic data the sums of squares and cross 
products of the two measurements for the sepa- 
rate samples. Though this test has previously 
been well known the current paper shows means 
for obtaining the sums of squares required in an- 
alysis of variance table for the differences from 
corresponding tables for the two separate meas- 
urements. 

The need for the procedure reported arose in 
a study of the relations between the responses of 
teachers to certain questionnaire items and 
changes in the means of a certain quantitative 
measure taken at an interval of three years. 1* 
The appropriate test for determining the signifi- 
cance of the mean change for any one group has 
been well known and included in standard texts.? 
Likewise, the testing of the significance of the 
differences among two or more sample means of 
individual changes can be tested by a straight 
forward application of analysis of variance using 
the individual’s change as the variable under con- 
sideration. This paper shows a means for ob- 
taining the entries for such an analysis of vari- 
ance table without the necessity of computing the 
individual change scores provided the cross- 
product sums are available for each sample in 
addition to the sums and sums of squares for the 
two separate scores on the quantitative variable 
concerned. 

Let Xjj and Yij Tepresent the initial and sec- 
ond score obtained by the ith individual in the jth 
group on the quantitative variable whose change 
is under consideration. 

Let nj represent the number of individuals in 

the j*® group. 

Let K = the number of groups and 

N = the total number of individuals in all 
groups. 

Let dij = Xij - Yij- 





*See footnotes at end of article 





tr ida 
Letdj== = dij 
J Rj i214 ij 


1 

Let d = N Znjdj 

Tables I, Il, Il and IV express the sums of 
Squares and cross products ordinarily used in 
one-way classification analysis of variance. In 
the paragraphs below the entries for Table I are 
expressed in terms of the entries in Tables II, I 
andIV. There are also occasions in certain re- 
search problems in which it may be used with 
slight modifications of the expressions below for 
analysis of variance of the sums of two scores 
or certain other linear functions of the two scores 
obtained. 

The sum of squares for the total line in Table 
I can be expressed as 


BE (dij - @* = DE ays* - 5 [ EE diy]? (1) 
But EE dij* = FL(Kij - Yi;)* a 
= ZEXj\j* + LE ¥ij* - 2D Kj; Vij 
and (22 dij)*= [ DZ (Ki - Yij)]” 
= (22x)? +[ Sz ¥ig)* - Af LE Xij) 
[ Zz ¥ij] (3) 


By substituting (2) and (3) in (1), the total sum 
of squares can be expressed as 


LE (dij - d)* = DE Xj? - al EE Xij) * + ZZ ¥i;" - 


al EE Vij)® - 2 {[ LE Xiyj¥ij) - = EE Xij) 
[zz ¥ij)} (4) 


Hence the sum of squares for the total line 
in Table I can be expressed directly in terms of 
the entries in Tables Il, IIlandIV. In these 
terms, equation (4) becomes 


Cy = Co +Cg - 2C4 (5) 
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TABLE I 


ANALYSIS OF VARIANCE TABLE FOR ONE-WAY CLASSIFICATION OF CHANGES 





Source of 
Variation S.S. es Mean Square 





Between Groups Zr nj (dj -d* = Aj Al /x-1 
Within Groups EE (dij-dj)* = By Bi y-K 


Total EE (djj-d)? =Cy 





TABLE II 


ANALYSIS OF VARIANCE TABLE FOR ONE-WAY CLASSIFICATION OF 
INITIAL SCORES 





Source of 
Variation Ss. S. 





1 1 
Between Groups = nj[ DXi) * - gl DEKij]* A 
Within Groups LE Xj -= ai! = Xij) = Bo 


Total LE Xi; - al Zz Xij)* =Co 





TABLE Itt 


ANALYSIS OF VARIANCE TABLE FOR ONE-WAY CLASSIFICATION OF 
FINAL SCORES 





Source of 
Variation Ss. S. 





Between Groups zal DY¥ij)* - al ZLYij)* = Ag 


Within Groups Zz Yij* - = ‘, [2 ¥ij]* = B3 


Total BE ¥4j? - gf EE ijl? = C3 
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Similarly the sum of squares for the within 
groups’ line in Table I can be expressed interms 
of two original measurements of the variable 
whose change is studied. 


EX (dij-dj)* = DEdij’ “Eel = dj]? 


= DEK + LE Vij* - 2EE Kj Vij 


1 1 1 
~E gg lB Xl” -2 gi lB Miyl’ +22 gf 2 Ryl 


[Z¥%] 6) 


By grouping the first and fourth terms of the 
right member of (6) as well as the second and 
fifth terms and next the third and sixth terms, 
the within groups sum of squares for Table I 
can be expressed in terms of sums of squares 
and cross products in the second lines of Tables 
Il, Il and IV. Thus equation (6) can be expressed 
in terms of the entries in these tables as 


By = Bg + Bg - 2 By (7) 
Since the sums of squares are additive, the 


determination of the ‘‘between groups’’ sum of 
squares for Table I can most easily be obtained 





by subtraction within this table, though the rela- 
tion (8) can also be used. 


A, = Ag +Az-2A,4 (8) 


Tables V, VI, and VII show the application of 
this test to data involving the significance of the 
differences among the mean changes from pre- 
test to post-test for 251 teachers who were clas- 
sified into seven groups based upon their rating 
of one aspect of their teaching communities. 

In the analyses shown in Tables V, Viand VII 
the seven strata were determined on the basis of 
the teachers’ ratings of ‘‘living conditions’’ in 
their teaching communities. The analysis indi- 
cates that neither the means on junior year MTAI 
scores or the mean changes differed significant- 
ly for strata thus determined, though the means 
on the current MTAI administration did differ 
among the seven strata. Further analyses 
showed that the stratum rating ‘‘living condi- 
tions’ 1 or 2 on a ten-point scale had higher 
mean MTAI scores than other strata when con- 
sidering the scores obtained during their second 
or third year of teaching experience, but did not 
show such a difference on their junior year 
MTAI score. 


FOOTNOTES 


1. The questionnaire responses consisted of 
teachers’ ratings of different aspects of their 
teaching community and working conditions. 
The quantitative variable was the teachers’ 
score on the Minnesota Teacher Attitude In- 
ventory which is indicative of the teachers’ 
ability to maintain good rapport with class- 
room groups. One score on the MTAI was 
obtained during the junior year of enrollment 





in the College of Education teacher prepar - 
ation program. A second score on the MTAI 
was obtained for each teacher at the same 
time as the community rating. This was in 
the second semester of the second or third 
year of teaching experience. 


2. Johnson, Palmer O. Statistical Methods in 
Research (New York: Prentice-Hall, 1949). 
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MAIN EFFECTS AND NON-ZERO INTER- 
ACTIONS IN A TWO-WAY 
CLASSIFICATION 


RAYMOND O. COLLIER, Jr. 
University of Minnesota 


Introduction 


THIS PAPER is concerned with examining 
tests of hypotheses on main effects and inter - 
action in a simple two dimensional classification 
assuming fixed effects. Specifically, we view 
the question, ‘‘If the interaction effects are not 
assumed to be zero, what is meant by a test ofa 
main effect and how is it made?’’ 


Tests on Interaction and on Main Effects 
Assuming Interactions are Zero 








First, let us write the ordinary classical 
model in which we assume an observable quanti- 
ty, Xijh to denote the kth individual observation 
in the i“ row and jth column. This observation 
is assumed to be characterized as follows: 


(1) Xijk = M+ Oi + Bj + Vij + Cijk 


where 
is a fixed general effect 
is the fixed effect for the ith row 
is the fixed effect for the jth col- 
umn 
is the fixed effect for cell i, j, 
a random component referred to 
as error or residual, is normally 
and independently distributed with 
mean 0, and variance o? 
andi=1,2,...,%; j#1,2,..<,d;K=1,2,...,K 


Now in order to estimate the parameters in (1) 
we impose the restrictions 


(2) aj= 2 Pj= ZZ Ijp=z Fj =0 

Eur T Ae Pye 
Next for purposes of reference, we present in 
Table I the well-known analysis of variance ta- 
ble. In Table I the following definitions apply: 


(3) SS(u) = X?.../1JK 
(4) SS(a) = 2 Xj, ./IK- X*_, /1IK 


(5) $8(9) = 2 X* 5, AK - X*... /UIK 





(6) SS(v) =E 5 X%; (K- EK, /IK- ¥ 
i j " i 


(7) SS (E) 


(8) SS (T) 


where a dot indicates that the subscript it 
replaces has been summed over, e.g., 
bi Ex 
tet ft be 
Now to test the hypothesis of no interaction, H>: 
Vij = 0, we may utilize the likelihood ratio test 
by means of normal regression theory (see ref., 


1:168-87) in a straightforward manner to arrive 
at the usual F-ratio 


(9) F{ 7] = SEE Te 


Next if we assume that the %j are all zero—that 
(1) can be written as 

(10) Xijk =A + Oj +f; + €ijx, 

then as Johnson (ref., 2:220) has indicated, the 
likelihood ratio test of the hypothesis of no row 


main effect, Hp:a; = 0, is made by means of the 
ratio 


SS(a.)/(I - 
q o 


+ 


(11) Fla’) = 1 


+ - 
the denominator mean square being formed by 
pooling SS(E) and SS(7). 

Tests on Main Effects Assuming Interactions 
Are Not Zero 


When we attempt to test, Hp):aj=0, in (1), 
however, and do not assume the 7;j=0, we en- 
counter some difficulty. Thus following normal 
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regression theory we minimize the residual sum 
of squares 


(12) 2 : E (Kijke - A= 4 - Bj - vj)" 


under no hypothesis by setting equal to zero the 
partial derivatives of (12) with respect to a, aj, 
Bj, and 4jj. The following estimates of these par- 
ameters result: 


(13) = X,../1JK 
i = Xi../JK - X,../1JK 
= Xj, /IJ - X,., /1IK 
ij = Xjj. /K - Xj. ./IK - X, j, /1d+ 
X,../1JK 


Substituting these values in (12) we have the 
usual ‘‘within cells’’ or error sum of squares, 
SS(E). 

Now after imposing the hypothesis, Ho:aj=0, 
and taking the minimum of (12) under this hypoth- 
esis, we have the following estimates of the par- 
ameters: 


(14) _/13SK 


X.. 
0 
x 


_j. /IK - X,, . /1JK 
Xj. /K - X, j. /IK 


WMD B 
Pte ete pee 


j 
Substituting these estimates we obtain the mini- 
mum of (12) under the hypothesis to be: 


(15) 89 (E) =z ; > (Kijk - Xij. /K)* 


which is, of course, identical with SS(E). 

According to normal regression procedures 
the sum of squares, SS' (E) - SS(E), represents 
the sum of squares due to the aj and the hypoth- 
esis, Ho:aj = 0, is tested by forming 


(16) F(a,) = SS' (E) - SS(E)] AI - 1) 


Since the numerator of this ratio is zeroitseems 
that no likelihood ratio test of the row main ef- 
fect hypothesis is possible, at least if we define 
a row effect as aj and the hypothesis on these 
constants as aj = 0. However, it will be seen 
in the next section that upon defining the row 

main effects differently a test of this hypothesis 
is possible. 


An Altered Model and Tests of Main Effects Not 
Assuming Interactions to be Zero 








Let us view the whole problem in a slightly 
different light and write the linear model of (1) 
as 





(17) Xijk = Sij + €ijk 


where $ ij = M+ Aj + Bj + Vij, the cell 
representing the mean of cell i, j. 
and €;j, is defined as in (1) 


Suppose following (17) we define a main effect 
constant for the ith row more specifically (see 
ref., 3:91) as the mean of the cell means for the 
ith row, $ ;,/J. Under this definition the pre- 
vious row effect hypothesis, Hp:a; = 0, becomes 
equivalent to Hp:$;./J =¢, that these main ef- 
fects newly defined are equal, no assumption of 
zero interactions being made. This is seen to be 
true since ¢ = ¢,/IJ = w from (17) andi, /J =~ 
+ Aj. 

Again following normal regression theory we 
minimize 


2 
(18) FzE (Xijx - $ ij) 


with respect to the ¢ ;j; to obtain the estimates 


“~ 


and on substituting in (18) we obtain its minimum 
to be SS(E) as before. 

Imposing the hypothesis Hp:C;,/J =C, we min- 
imize (18) again under this condition. Thus, by 
the method of Lagrangian multipliers, the mini- 
mum of (18) under this hypothesis becomes 


SS" (E) = 5 ° © (Kijie ~ Xijh/K)? + e x?,; /JK 
-x’ /lK 
= SS(E) + SS(a) 
and the sum of squares due to the hypothesis is 
(20) 8S"' (E) - SS(E) = SS (a). 


The test of the main effect hypothesis, Ho:¢;. /J 
=C, is made by referring 


(21) Fla) = -SeRRy7ER Ty 


to the F-distribution with (I - 1) and IJ(K - 1) de- 
grees of freedom as one would expect on refer- 
ring to the expected mean square column of Ta- 
ble I. Notice that the denominator sum of 
Squares is simply SS(E) alone. It is clear that 
the test of column effects, Ho: C. i/1=6' , follows 
completely parallel lines. 


Summary 


It seems from the above that likelihood ratio 
tests of row or column main effects can be made 
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in a two-way Classification even if interactions 
are assumed to exist. It is necessary that these 
main effects be specifically defined to be the row 
or column means of cell means. However, such 
tests of main effects, when interactions exist, 
may provide limited information and other tests 
of hypotheses oncell means for each row 
or cell means for each column may be de- 
sirable. 
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SIGNIFICANCE LEVEL IN FACTORIAL DESIGN 


RICHARD McHUGH 
University of Minnesota 


THE ADVANTAGES of factorial design have 
frequently received emphasis. As a result, use 
of the factorial experiment in psychological r e- 
search is increasing (4). However, theanalysis 
of data from factorial experiments confronts the 
researcher with an important statistical issue 
which has not been given prominance in the psy- 
chological literature. 

Before the investigator began an analysis of 
the data of Table I (1:226), suppose he planned to 
test at the 1% level the significance of A, the 
main effect of the factor ‘‘number of presenta- 
tions.’’ From a table of the ordinary analysis 
of variance F distribution, for 1 and 72 degrees 
of freedom, Fj% = 7. Hence the investigator 
would correctly conclude that A is significant 
since the observed F for A is 105.07. 

Now, a basic presupposition of the convention- 
al F test is that the researcher plans to test just 
one effect per experiment, e.g., A as in the fore- 
going. However, suppose—as is likely to be the 
realistic case in practice—the researcher 
wished to test not just one effect but all the ef- 
fects: A, B, C, AB, AC, BC and ABC. In this 
case, of multiple tests of significance, it is in- 
adequate to use a critical F value of 7 for testing 
the F ratios corresponding to all six independent 
effect mean squares. For example, itis appar- 
ent that the largest F ratio is more likely to be 
inferred as significant than any of the other var- 
iance ratios. Thus, to take an extreme case, 
with 20 independent mean squares for examina- 
tion, the largest F ratio would be expected to be 
significant at the 5% (or 1 in 20) level, even 
though under the null hypothesis all the mean 
Squures estimate the same variance. 

Obviously, then, those of the c independent ef- 
fects in an orthogonal analysis of variance table 
which are declared non-significant via the con- 
ventional F tables are indeed non-significant, 
i.e., would a fortiori be judged nonsignificant if 
the appropriate F tables were available. Butthe 
significance of any effect judged to be significant 
on the basis of the conventional F test may be 
Spurious. 

An appropriate test procedure for this multi- 
ple significance test situation, so common in fac- 
torial analyses, will now be presented (3). 

Suppose, by way of illustration, that a 5% 





test of significance is desired. The general tech- 
nique to be described ensures a significance lev- 
el which will not exceed 5%, for all effects test- 
ed. It is convenient to make application first to 
Table Il. The steps are: 

1. Rank the c effect mean squares on the ba- 
sis of their apparent significance, i.e., on the 
basis of the nominal significance probabilities, 
say P, associated with the observed variance ra- 
tio, Fo, of each mean square, where P = proba- 
bility { F > F /null hypothesis is true}. 

For example, in Table II, interpolation in the 
ordinary F tables gives for the main effect of B 
a value for P of P = probability { F2 96 23. 46/ 
no real effect for B} = .037. In this fashion, the 
following ranking is obtained for the effects of 
Table II: 


Rank, r P Effect 


< .001 A, schools 
.017 C, methods of instruc- 
tion 
. 037 B, instructors 
.041 
. 068 
> .20 
> .20 


At this first stage, certain effects can be de- 
clared non-significant at once. Thus clearly BC, 
AB and AC cannot be significant at an appropri- 
ately conservative a = 5% level, since they are 
not significant at the nominal 5% level, say a'. 

2. The Hartley test now proceeds seq uenti- 
ally. At each step, the test is made by compar- 
ing the P for a specified effect with a' /k where 
a" = the nominal significance level and k=c -r 
+1, where c is the total numer of effects and r 
is the rank determined in step (1). First, out of 
all the c effects the one with the maximum nom- 
inal significance, i.e., minimum P, say Pj, is 
tested by comparing P; with a'/k=a'/c here 
(since in this case k=c-1+1). If Py <a'/c, 
i.e., if the effect is actually significant, testing 
is continued. Otherwise it stops, since there is 
no possibility of an effect with a larger P being 
significant. 

For example, in Table II, c = 7 and for the 
main effect of A, Py< .001 which is < 5%/7 = 
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TABLE I 


ANALYSIS OF VARIANCE OF RETENTION SCORES (1:226) 





Source df MS 





A (number of presentations) 1 
B (modes of presentation) 1 
C (times of testing) 1 
AB 1 
AC 1 
BC 1 
ABC 1 





72 





TABLE I 


ANALYSIS OF VARIANCE OF ACHIEVEMENT SCORES (1:246) 





Source 


Q 
ie) 


SS MS 





A (schools) 

B (instructors) 

C (methods of instruction) 
AB 

AC 

BC 

ABC 


Qnwarnw 





Error 


wo 
a 
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.007, and hence the main effect of A is sig nifi- 
cant. 

3. Next, the second ranking effect is tested, 
significance being declared if Po <a'/k=a'/ 
(c - 1) here (since now k = c - 2 + 1), non-signif- 
icance otherwise. 

For example, the main effect C of ‘“‘instruc- 
tors’’ in Table II has the second smallest P and 
this effect is judged not significant since P9 = 
.017 > 5% /6 = .008. In this illustration, the se- 
quential testing is now complete, with a verdict 
of significance for A, and non-significance for B, 
C, AB, AC, BC and ABC. This should be con- 
trasted with the rashness of the inappropriate 
conventional F test which returns not only A but 
also B, C, and ABC as significant, with only AB, 
AC and BC as non-significant. 

4. In the above illustration based on Table I, 
specific testing ceased with Pg versus a'/ (c - 1). 
In general, the test procedure continues, sequen- 
tially, [Py versus a'/c, Pg versus a'/ (c - 1), 
etc.,] until a non-significant result is obtained 
(as im the present illustration), or until all c mean 
squares are declared significant. 

Naturally, Hartley’s sequential F-test tech- 
nique is not restricted to the 5% level, but is ap- 
plicable, in general, to a level of significance, 
say a, defined as ‘‘the proportion of experi- 
ments with at least one erroneous inference’’(3). 
This is a definition appropriate to the case of 
several (multiple), say c, inferences in an exper- 
iment. A special case of the above definition of 
a occurs for c = 1, viz., just the conventional 
definition of a significance level as e propor- 
tion of experiments with one erroneous infer - 
ence’’ (3). 

In actual application, further simplifications 
are possible. Thus, for tables such as Table II, 
it is not necessary to compute the significance 
probabilities Pj in all cases. For example, sup- 
pose the number of mean squares, c, is < 50 
(highly likely in practice). Then since .05/(c - 
r +1) is > .001 (i.e., this critical value for c = 
50 and for r=1, 2, 3, --- has successively the 
values: .001, .00102, .00104, ...), clearlyany 
F ratio exceeding the .001 % point (corresp ond- 
ing to the degrees of freedom of that ratio) is sig- 
nificant at the a = .05 level. Hence it is unnec- 
essary to calculate Pj, exceptfor F ratios small- 
er than the .001 points. By way of illustration, 
corresponding to A with 3 and 96 degrees of free- 
dom, the .001 point of the F distribution is ap- 
proximately 4.6 (5). Since the observed F of 
6.28 exceeds this .001 point, A is declared sig- 
nificant at the 5% level. 

Finally, for tables such as Table I, i.e., hav- 
ing each of the factorial effects mean squares A, 
B, AB, etc., based on 1 degree of freedom, a 
further simplification is possible. Instead of a 
ranking of the effects on the basis of their ap- 
parent significance probabilities, the same rank- 





ing can be based directly on the observed F ratios. 
(This is possible because the Pj and Fj of the ith 
effect are in one to one correspondence in the 
case of 1 degree of freedom effects. This is not 
in general the case for the Pj and Fj of tables 
such as Table II, where the effect degrees of 
freedom vary. ) 

Thus for Table I, suppose significance state- 
ments at the 1% level are desired. The steps in 
sequential testing are: 

1. Rank the effects via their observed 
F values 


Rank Effect 


1 ; A, number of presenta- 

tions 

C, times of testings 

B, modes of presenta- 
tion 

AC 

BC 

ABC 

AB 


2. The testing now proceeds sequentially. At 
each step, the test is made by comparing the F 
for a specified effect with Fq'/k, where a' is 
the nominal significance level andk=c-r+1 
as before. 

In the present example of Table I, for the test 
of A, the first ranking effect, we have a' = .01 
andk=7-1+1=7. Also Fq'/ k= F.01/7 = 
F 0014 for 1 and 72 degrees of freedom is approx- 
imately 11.0. Since A has a variance ratio of F 
= 105.07 which is > Fg: /7 = 11.0, thenthe main 
effect of A is significant. 

The final simplification is conveniently intro- 
duced at this point. It is unnecessary to carry 
out the awkward interpolation for Fq'/, if the 
a = 5% or 1% levels are employed, because of 
the existence of a brief table due to Nair (5:164). 
From Nair’s table, fork = 7 and1 and 72 de- 
grees of freedom, the appropriate critical F 
point is procured directly as Fj% = 11.0. 

3. Next, the second ranking effect is tes ted, 
significance being declared if the observed F is 
> Fa: /e(since for r = 2, nowk=7 - 2 + 1=6), 
non-significance otherwise. 

Here the observed F corresponding to the 
main effect C is F = 64.28 which is > 10.5, the 
latter being obtained from Nair’s table for 1 and 
72 degrees of freedom as before, but now for 
k = 6. 

4. Proceeding in this fashion, the observed 
F’s for B, AC and BC are testedagainst critical 
F’s of 10.2, 10.1, and 9.6, obtained from Nair’s 
table with, respectively, k= 6, 5, and 4. Evi- 
dently the last of these three effects, BC, with 
an F of 4.22, is not significant, so the multiple 
testing is complete. 

Notice that a conventional analysis of vari- 
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ance would employ a single critical F of 7.0 (for 
degrees of freedom 1 and 72) and so conclude 
from Table I that all effects are significantat 
the 1% level with the exception of the AB, and 
ABC interactions. However, the Hartley sequen- 
tial test, while agreeing on the significance of A, 
C, Band AC does not return a verdict of signif- 
icance for BC. 

The foregoing technique is applicable to any 
analysis of variance multiple comparison situa- 
tion of the orthogonal type. Thus, e.g., the re- 
cent exposition (2) of orthogonal polynomial re- 
sponse curve analysis might profitably be rean- 
alyzed according to the above principles. 


REFERENCES 


1. Edwards, A. L. Experimental Design in Psy- 
chological Research (New York: Rinehart, 








JOURNAL OF EXPERIMENTAL EDUCATION 


1950). 


. Grant, D. A. ‘‘Analysis-of-Variance Tests in 


the Analysis and Comparison of Curves, ”’ 
Psychological Bulletin, LIII (1956), pp. 
141-54. 





. Hartley, H. O. ‘‘Some Recent Developments 


in Analysis of Variance,’ Communications 
on Pure and Applied Mathematics, VIII 
(1955), pp. 47-57. 








. Kogan, L. S. ‘‘Variance Designs in Psycho- 


logical Research,’’ Psychological Bulle- 
tin, L (1953), pp. 1-40. 





. Pearson, E. S., and Hartley, H. O. Biomet- 


rika Tables for Statisticians (Cambridge: 
Cambridge University Press, 
1954). 








JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 26, March, 1958) 


A MODIFIED STANINE SCALE 


HENRY F. KAISER 
University of Illinois 


THE STANINE scale may be thought of as an 
areal transformation of the normal curve, for 
which the ‘‘basic unit’’ is abscissae values of 
one-half sigma under this curve. It may easily 
be shown that this basic unit forces the stanine 
scale to have a standard deviation of 1.945. It 
seems rather unfortunate that this standard de- 
viation does not take on an integer value, as is 
the case with other well established derived 
scales—e.g., T-scores, standard scores, etc. 

Let us then reverse our notion of what the 
basic unit in defining the stanine scale is to be, 
and define the stanine scale such that its stand- 
ard deviation is an integer. Specifically, let this 
integer be equal to two, in order to do as little 
violence as possible to the conventional scale. 

For this purpose we write: 


9 cS 
My i | 7a 


t-(2x-11)a/2 





t=(2X-9)a/2 


ef /? atl} (1) 


except when t = + 9a/2, forwhich t = +, and 
where X = stanine score and a is the unknown ab- 
scissa value under the normal curve that gives the 
desired result. Upon consulting the table of the 
normal probability integral it is seen that 


a = 0.482 


is the solution (to three decimals) for equation 


TABLE I 


PERCENTAGE OF CASES FALLING WITHIN 
EACH STANINE INTERVAL FOR BOTH THE 
CONVENTIONAL AND MODIFIED SCALE 





Percentage of Cases in 
Stanine Interval 





Conventional Modified 





. 006 4. 580 
. 559 6. 830 
. 098 12.074 
. 467 16. 994 
. 741 19.044 
. 467 16.994 
. 098 12.074 
. 559 6. 830 
. 006 4.580 
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Sstanine . 945 2.000 





(1). Re-entering the tables we find the percent- 
age of cases falling within each stanine intervai 
for both the conventional and modified scale. 
These results are given in Table I. It will be 
noted that the proposed modification results in 
very little change, certainly not enough to war- 
rant any qualitative re-evaluation of the scale, 
while, on the other hand, giving us a scale which 
is somewhat neater to handle quantitatively. 
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